Re: PDF is clunky.
I have to mention ABBYY finereader. Seems nobody else does. It used to be bundled with Fujitsu SnapScan.
HTML is the world's most common digital document file format. However, it's not the one everyone turns to when they want to create a precise document that looks, prints and behaves the same on any platform on any device. And it's hardly the format of choice for immediate offline reading, easy sharing or simple portability. For …
"You'll find it on page 19."
Only 2 away? Downloads of C19th books from archive.org or Google books can be way, way more out than that. With the occasional plate that didn't have a page number. And that's only vol 1 when the page numbering continues into multiple volumes. With luck the OCR isn't too bad and you can search for the actual page number.
I once worked for the London branch of a US company. We would get long technical documents from head office in the US, and print them on A4 paper. So the page numbers got steadily out of step.
This was 1998-99, when we techies did not have the software to turn .doc to .pdf. One very useful trick of pdf is that it can print US letter size pages on A4, or vice-versa, thus preserving page numbers and layouts.
I've had need (or nerd, rather) to look at the specs at some point, and there's so many bewildering things in there... base95 (because why waste characters?), a freaking filesystem, because that's really what PDF is... I want to say there's special support for barcode-ish things in there, too, but I've not found it looking at the PDF1.7 ISO document.
I can see how they were convinced it would be a great step towards low-paper office workflows, if you go all in, all the way.
a freaking filesystem, because that's really what PDF is
When you want to flatten complex documents into a single file, you're probably going to end up with a compound file format of some sort. Open Document Format is a compound file format - it's just a zip archive, in fact. OOXML and XPS are compound file formats. EPUB is a compound file format.
The alternative is a single non-compound format that encompasses all the types of data you might want. That's worse: it's more cumbersome to define, document, implement, etc. With a compound file format, it's trivial to build toolchains that operate on only some parts of the entire document - the explode / filter / implode pattern.
HTML and its siblings can get away with not being compound because they present a de facto remote filesystem to the user agent. They don't try to flatten everything into a single byte-stream blob.
There is no mention here of Novell Envoy (1995), which I had, as part of PerfectOffice, and used. A free downloadable viewer (which no one had) and the ability to package the viewer together with the .ENV file as an .EXE, which was quickly to become an unacceptable attachment in several systems as did a similar emission from OmniForm.
Perhaps none of these competitors threatened -- or will ever threaten? -- the economic success of Adobe, but they must have had some influence.
PDF is the defacto document standard because it just works for the mass population. Sure, there are the 1% who have more specialized needs, but getting the level of fidelity that PDF provides for the effort (or lack of) that 'print to PDF' provides makes it a lock.
So, props to Adobe, Warnock and the author for a fun and informative article.
Now we must turn our attention to Acrobat. Oh, Adobe, you break my heart every time I open your Reader. What did you do with Reader DC? The weird-ass offering of McAfee with every download that makes you look like a skeevy mp3 ripper from a warez site. The kindergarten UI. The tool bar that is un-hideable, eating up a 2-inch wide chunk of my screen for no purpose whatsoever. Whyyyy?
Not a PDF fan at all.
About 10 years ago, when I was working in marketing/communication, it was one of the most overused and misused file format in the business world.
People had to convert everything and anything to PDF because it was trendy and expected, although Office file formats where already open, free viewers were already available for the 0,00001 % that didn't work with Office or a program able to open Office documents, and most of all, 100 % of our customers and partners did work with the same software we did: Office, Adobe photoshop/illustrator/indesign /quark xpress.
PDF caused problems to no end because of compatibility issue, file size, remarquable slowness of viewer.errors while printing on the latest PostScript printers and more.
And it cost us a lot of money in licenses for a step that basically brought zero economic value.
"most of all, 100 % of our customers and partners did work with the same software we did: Office, Adobe photoshop/illustrator/indesign /quark xpress."
And what about the consumer, the person who you hoped to read the product? Were there supposed to have a copy of Quark Xpress to read it? If all you were doing was print it you wouldn't need to have gone near PDF.
If you wanted it to have been read online then if you wanted to ensure they saw what you saw you'd have little option; your reader with Word or Word viewer might not have had the same fonts, or even fonts with the same metrics. Your reader then sees a slightly dishevelled document and forms their view of whoever sent it accordingly: remember that what's communicated is what's received, not what's transmitted.
We don't need interactive content in PDFs, the PDF format is a great way of distributing 'paper' documents so they can be viewed and printed with the formatting as the original creator intended.
Having interactive content inside a PDF sounds like a hackers dream for spreading their malware.
I really hope therefore that no one bothers with version 2 and they stick with the older versions which do what they were intended to do quite well.
> IMO, digital signature support in PDF is unmatched by any other office file format
LibreOffice does digital signatures just fine *and* since version 5-something that includes support for PGP signatures which PDF does not have.
While a problem of the implementation and not the format per se, PDF signatures can be a bit of a minefield. The two major reasons are that the signature may refer to only part of the document but there is no clear visual indication of what part exactly (might have changed?) and the other is Acrobat's (?) stupid idea of adding some sort of stamp-like image onto the document itself, which has caused a lot of people to just look for the image and remain completely unaware of the actual signing mechanism.
As I say, that's not a problem of the actual spec, just shit software out there.