Print to PDF
On webpage press Ctrl-P to Print, Select PDF as target printer and save as a file (.pdf)
The UK’s Government Digital Service (GDS) has revealed it’s working on a tool that will export its web pages as PDFs. News of the effort comes at the end of a post that spends most of its time slagging off PDFs. “Compared with HTML content, information published in a PDF is harder to find, use and maintain,” wrote GDS …
The problem with Ctrl-P is that many smart-arsed web designers implement different style sheets for print and screen because they think they know better. (Our in-house templates, for example, include the URL to links when printing web pages out.)
The most reliable way I've found to print out a web page is, unfortunately, to screen shot it.
The issue isn't the generation of the PDF file. It's that they need to format the HTML in particular ways so that the generated PDF is accessible, functional, etc.
Also, they need to convince the coloured pencil department to stop producing PDF files themselves as they're universally shit for accessibility. Produce HTML that can be converted and everyone will be happy.
"It's that they need to format the HTML in particular ways so that the generated PDF is accessible, functional, etc."
Actually Chrome's print-to-PDF is pretty good at this, frankly. The resulting PDF document is fully searchable, text can be selected, etc.. I presume this means a screen reader would be effective (since clearly the original text is preserved as text). Hyperlinks in the original HTML become hyperlinks in the PDF.
If there is something missing, it would make sense to contribute it to the open-source Chromium codebase rather than invent a wheel with more corners.
Of course, if your original HTML was sh*t from an accessibility POV to begin with, print-to-PDF is unlikely to improve upon the situation.
it's not that simple as virtual pdf driver (I use it every day to save beeb articles for kids). Trouble is, they print EVERYTHING, when all you want/need is the specific text. Not the links on the left, not a photo SPLIT into two pages (as it the norm with pdf printing). Not that they want people to print pix! :)
it's not that simple as virtual pdf driver (I use it every day to save beeb articles for kids). Trouble is, they print EVERYTHING, when all you want/need is the specific text. Not the links on the left, not a photo SPLIT into two pages (as it the norm with pdf printing). Not that they want people to print pix! :)
I find the tools at printwhatyoulike.com quite helpful for this. You can easily mark regions/objects to exclude. I have used the Chrome addin and the JS bookmarklet
>I find the tools at printwhatyoulike.com quite helpful for this.
I've used the extension/add-in Print Edit WE (Chrome/Firefox) for this job.
However, in using such tools, you do get an appreciation of just how varied HTML is and the ease, or not in some cases, with which a webpage can be reduced to it's substantive content. I'm sure this variable quality of HTML plays havoc with accessibility tools.
And, as someone who's been clearing up the estate of a recently deceased relative, I can tell you PDFs have their place. That's one job that would be a LOT easier if the mountain of paperwork had been scanned and OCR'd to PDFs rather than being randomly shoved in boxes over 30 years.
Re: the Internet connection, we're not there yet. When access is 100% ubiquitous, cloud services manage to run years with zero downtime, and companies don't bail out of providing their services with almost no notice.... then offline PDFs and paper will have had their day. Today, the reality is that you just cannot rely on accessing online information when you need to.
This post has been deleted by its author
This post has been deleted by its author
"Aren't these people meant to use plain English?"
Whatever gave you that idea? This is GDS. Whatever they're doing it has to be buried under the most opaque mounds of gibberish to stop anyone finding out.
Are you sure this is .gov.uk and not Wikipedia? The same leisurely approach to customised HTML-to-PDF conversion is under way in both, and Wikipedia have made a .gov.uk-style ballsup of their first two stabs at it (wrt stephanh's comment, round two was a fruitless attempt to make headless Chrome fit for purpose) and in desperation have outsourced Round Three to their book publisher. It's so the same story in different clothing.
HTML5 sucks in more ways than most folks, including .gov.uk, realise >cough< offline >cough< javascript >cough< information layout >cough< and pdf, done properly, has a lot going for it in its own niche. But I have to ask, if dual-media publishing from a single source is the aim, then why fuss about accessibility of the pdf when you can have to flippin' access the html edition in the first place in order to get to it?
The other advantage of a *good* HTML to PDF system is the ability to select multiple web pages, and combine them into a single PDF document, with sections in the correct order.
For example, try to print the NCSC CLoud Security Principles starting from https://www.ncsc.gov.uk/index/topic/151. Similarly try printing appropriate employment and tax pages. The next trick is to make it print double-sided.
I have - once- come across a system which would let you select the desired sections of a larger set of documents, then it would generate a single PDF of them all, in a suitable format for printing.
I have - once- come across a system which would let you select the desired sections of a larger set of documents, then it would generate a single PDF of them all, in a suitable format for printing.
Expert PDF from Avanquest used to have this feature. There were times when it was really useful, for example taking a printout of a shopping basket (full list of part numbers and descriptions of content and prices) and then a printout of the final checked out order (basic item details and pricing).
A PDF is self-contained.
An HTML document has css links and scripts (and trackers)
A PDF can be reliably printed and passed around. (A lot of people are not digitally agile)
An HTML document requires a computer/device, browser and the knowledge to use it. A hard copy to get signatures on is pot-luck.
A PDF can be reliably stored as reference. I have it. I can archive it and index it.
An HTML user manual (say) is moved, deleted, or updated to reflect model 2 features but not my model 1
What they should be doing is banning Word documents.
"A PDF is out of date the minute it is made."
A fact which is extremely problematic for those in govt. who might have a shifting relationship with what they said a minute ago and very handy for those who want ot hold them to account.
TL;DR? Permanence has value.
A pdf also requires a computer/device, the knowledge to install a PDF reader and the ability to use it.
If you're talking about printing then I don't care if you printed from a web page, a Word document or a PDf - it's printed and that's the end of the problem.
A pdf also requires a computer/device, the knowledge to install a PDF reader and the ability to use it.
Because I so often read HTML over the air and straight into my brain without using any device or software.
I don't know any current consumer OS that doesn't have a PDF reader. Windows - Edge does it. Linux - KDE has Okular, Gnome has Evince. Mac OSX - Preview. Android has Google's pdf reader. Both Chrome and Firefox will have a stab at it on desktop OSes.
In practice, PDF is handy for archiving documents. HTML doesn't work as well because in most cases it requires storing resources alongside it (though yes, you can base64 encode images and stuff them in), and how browsers interpret it changes over time, while display of PDF is more stable and there is the PDF/A standard. Whether the resulting document is accessible / searchable largely depends on the source document, if it was structured text (LaTeX, markdown, office documents, XML, and yes, even HTML) with a sensible interpreter then the resulting PDF can be accessible. If it was scanned pages of an article from 1950 then no, but the HTML version isn't going to be either.
I don't know any current consumer OS that doesn't have a PDF reader. Windows - Edge does it. Linux - KDE has Okular, Gnome has Evince. Mac OSX - Preview. Android has Google's pdf reader. Both Chrome and Firefox will have a stab at it on desktop OSes
Half of those you list are actual web browsers. You know, software designed originally to parse HTML?
At this point we can safely say that HTML and PDF are (roughly) about as easily accessible on any electronic device as each other. Not least because that most of the software for reading HTML will also display a PDF and vice-versa.
Mind you, basic HTML is at least somewhat human readable in a text viewer, which isn't something you can say about PDF.
Hardly the point really. I was pointing out that the idea it's hard to read PDF belongs back in 1995. And yes some of them are web browsers (making "about half of them" if you include firefox and chrome which I tacked on as additional examples of software you almost certainly already have).
You'll find those web browsers also display images, video, audio, plain text and will have a stab at displaying XML. Is HTML a substitute for all of those? Will the available version of those browsers display the same HTML document the same way next year? If you're displaying plain text why not just use plain text? Or markdown? It turns out different tools have different uses.
Oxbridge types (of the blue sky persuasion) do not use computers; they want a hard copy (emails, info etc) from their "girls" and still give dictation.
SPADs and other assorted climber-upers only believe in something if it's in Excel.
Managers tell their "girls" to type stuff into Word, save as pdf and slap it on their intranet page.
Only "girls" (and other data entry types) use "working class" html.
You can bet that behind this "necessity" is some crusty who wants his "girl" to send him an email with a pdf attachment so he can print it off.
To give you an idea of the arseness available... one top dog was on hols in France and was viewing a 320 page document on the UN web site, he wanted a copy so he phoned his "girl" in London and told her to print it off and fax it to him. I am not joking.
Store underlying data as XML - nice and simple content with some basic description
Run appropriate transform(s) to give HTML (the descriptive elements in the XML give appropriate HTML)
Run different transform(s) to give PDF.
Things like XSL-FO are your friend
It works nicely (did some proof of concept stuff on this ages ago, back when mobile devices had weedy screens, - same content gave desktop HTML, mobile HTML and PDF by running appropriate XSL)
So a natural fit for the civil service.
To get theological, the Civil Service are so "on the one hand, on the other hand …" the Devil would reject them and they'd end up in the Vestibule of Hell, chasing deviceless banners and being stung by hornets.
I want a Dante icon.
Both formats have their strengths and weaknesses; wise guys choose whatever suits the job at hand best.
Yes, PDF /is/ print-oriented - and that's a major advantage for publishing long texts that require attentive reading. A document set in a reader-friendly font with proper paragraph filling and hyphenation is so much easier on the eyes; it lets your mind focus on the content rather than the technicalities of a poor text rendering (which is the norm in HTML). I speak from experience, I do read a lot.
And I'm not alone. I work in an academic setting and at our lab, the computing devices most in demand ("high demand" being defined as "users scream /immediately/ when it fails") are 1. the personal laptop and 2. the workgroup printer - and that's for a reason. /Nobody/ would want to read a scientific paper as HTML on the screen, with the poor rendering constantly distracting the mind from the problem at hand. (Some folks do use rotating monitors for reading papers, but it is PDF they read on the screen in portrait format.)
And I haven't even mentioned the problem of embedded figures yet: good luck with copying the full content of a HTML page (skipping unneeded navigation code) for offline reading...
That is to say, there are use cases where HTML is simply no go.
The optimal use case for HTML (plus JS where that really makes sense) on the other hand is short, frequently changing or short-lived documents that noone would want to read offline or in print; or documents of a highly interactive nature; or reading the same document on a wide range of display sizes (making allowances for the text layout and rendering) - that's what it was designed for after all.
Bottom line: Use a hammer for nails and a screwdriver for screws. Heated ideologic debates as to whether screws are outdated and should universally be replaced with nails are frankly daft.
(And yes, both formats have rather more than their fair share of warts. A text format that is versatile enough to cover both use cases would be really nice to have. Good luck with developing something of the kind *and have it widely accepted by your audience*...)
As many commentards have noted, this is a frequently-solved problem. A decent minority of historic HTML/PDF solutions take the accessibility issues seriously.
I expect what the gov.uk chap means is that they'll take some such thing - probably XML-based - and integrate it into their own publishing.
That is, unless and until such a sensible goal gets lost under a weight of empire-builders and PHBs.
I frequently print web pages to PDF for storage and offline reading. In my experience it's not the PDF that goes out of date but the online content disappears or gets modified. The "1984" experience where information is centrally controlled and modified as necessary is easier to pull off every day.
The article says that "most [PDFs] come into existence because designers want total control." Unfortunately that's very much the same for Web content, where every element is arranged down to the pixel, images are deferred-loaded so that they can track when and how far down you scroll, and random ads appear all over the place as you move around.
HTML is practical, quick, and dirty, but as Cem Ayin suggests, it sucks to read. A DVI file typeset with TeX in 1982 still reads better on a desktop monitor than any Wikipedia or gov.uk page today, 36 years later, and any serious reader would prefer a well-typeset PDF on the screen. Serious reading is not some artifact of "ingrained print culture".
Let be honest most of it is because the people writing those PDFs are force to make them public by law or convention. If it was their choice they would be printed off and locked in some filing cabinet where members of the public or the media or even MPs would have to fight through a pile of bureaucracy to get to them.