back to article How a tax form kludge gifted the world 25 joyous years of PDF

HTML is the world's most common digital document file format. However, it's not the one everyone turns to when they want to create a precise document that looks, prints and behaves the same on any platform on any device. And it's hardly the format of choice for immediate offline reading, easy sharing or simple portability. For …

Page:

    1. Anonymous Coward
      Anonymous Coward

      Best: Acrobat XI; Worst: Acrobat DC (v12) =shitty UI

      I still prefer Acrobat XI (v11) from 2012. The new Acrobat DC has the worst UI ever.

      I want normal File | Edit | View | ... menubar. I want a normal toolbar and sidebar. Acrobat 1 - 11 were great. Acrobat 12 aka Acrobat DC is crap.

  1. Anonymous Coward
    Anonymous Coward

    "by which time the name "Camelot" had been changed "

    ... Camelot - 'tis a silly name

    1. Andy The Hat Silver badge

      ... Camelot - 'tis a silly name

      but Elderberry would have been worse ... :-)

    2. Anonymous Coward
      Anonymous Coward

      Should have gone with cameltoe ;)

      1. JulieM Silver badge
        Coat

        Cameltoe

        And no doubt the Open Source version would have been called LunchBox .....

  2. Paul Crawford Silver badge
    Facepalm

    Content creators have long been demanding a version of PDF that supports embedded HTML5-based media, interactivity and animation

    For the love of $DIETY no, no and thrice no!

    How many vulnerabilities have been in Acrobat reader due to the ability to execute arbitrary code? Please keep a document standard as that - something for reading and printing. Even the option for forms to fill in has piss-poor support and don't get me started on the shit that is the encrypted versions that only Adobe products can open.

    1. Wellyboot Silver badge

      Wasn't that the Flash idea?

    2. Stoneshop
      Devil

      For the love of $DIETY

      Now with 30% less hellfire and damnation.

    3. Daggerchild Silver badge

      Content creators have long been demanding a version of PDF that supports embedded HTML5-based media, interactivity and animation

      I'd love to see how those print out. I've already seen "Please click on this [link] for more information" in all its brain-dead-tree irony.

    4. Gobhicks

      $DIETY ...

      ... like a $DIET ?

  3. Blockchain commentard

    Since PDF's are normally designed for printed media, why would you want animated junk in it? If people want that kind of stuff, stick to HTML.

    1. Anonymous Coward
      Anonymous Coward

      Joke enters stage left...

      There are some examples around of people asking "how do I print off this video on page 3 of the PDF*"....

      [edit]* By that I mean PowerPoint Slideshow, because AFAIK you cannot embed a video in PDF :P

      1. Dan 55 Silver badge
        Alert

        Re: Joke enters stage left...

        Click if you dare. Or if you've got a Mac it's probably this one.

      2. Allonymous Coward
        Facepalm

        Re: Joke enters stage left...

        You can embed video in PDF. I used to have an unrewarding job webmastering for a large public sector organisation. There was a policy of "all website video needs approval", because accessibility and generally lousy production standards of most of the stuff comms/marketing droids were buying in or shooting.

        However there was no such policy for PDFs. And all of a sudden some people were uploading suspiciously large PDFs with very few pages. Upon closer inspection these were found to contain embedded video.

        "Can" != "should". Unless you're a JavaScript interpreter perhaps.

    2. JimboSmith Silver badge

      What really annoys me is people who don't understand the potential of the format. For example years ago I was supporting a small business who used PDF. The company received forms via email and were then printing them out to fill them in. They were being sent them from companies they were customers of through to a trade organisation they were a member of. Most if not all of these were locked down preventing filling them in electronically. These were just forms there wasn't anything about them that had intellectual property or anything like that. I pointed out to some of the companies concerned that they could put form fields in but don't think any of them had a clue what I was going on about. Some of them wanted the physical printed filled in copy sent by post and wouldn't accept a emailed version let alone a fax.

  4. Anonymous Coward
    Anonymous Coward

    An interesting article. Tried to get my screen-scraping bot to read PDF text in a browser***. I have never found an easy way to get the text words and paragraphs from the accessible page data. The exposed HTML, if accessible to Selenium, merely places letters in precise positions on a page.

    ***which for technical reasons is now Chrome.

  5. Joe Harrison

    PDF has its uses I suppose

    As the story says, if you just want to print it out like it's supposed to look then PDF is fantastic. Must be great for people who still own printers.

    For everything else just no. When Amazon first started selling the Kindle I had this wonderful idea to load our existing technical documents onto it. Nope, they are all PDFs designed to look like A4 and resisted my every effort at resizing to fit the smaller screen. Even trying to edit a PDF is a series of unpleasant workarounds unless, I suppose, you bought a full copy of Acrobat.

    It's got so much more difficult recently because of Adobe trying to "cloudify" everything, plus extend PDF far past simple document representation into interactive forms and their own version of electronic signing.

    Funny that PDF originated from a way to print tax forms because as it happens the "Agencia Tributaria" (Spanish tax equivalent of HMRC) have done a thorough job of offering their users a PKI-based online tax system. Works well but its weakness is its dependence on PDFs and problems are guaranteed in those bits every time.

    1. Roland6 Silver badge

      Re: PDF has its uses I suppose

      Not just for print outs, communication.

      So you want to do away with a standard, so when I refer you to page 404 of the HTML status code manual, you get something totally different because in your rendering of the manual the relevant material is on page 418 or even 1415...

      Similar considerations apply when I try to refer to the same material across devices and formats.

      1. Allonymous Coward

        Re: PDF has its uses I suppose

        This comment deliberately left blank.

      2. Richard 12 Silver badge

        Re: PDF has its uses I suppose

        That's why standards and legal documents use paragraph numbers.

        As does the Bible and other holy books.

        That particular problem was solved over a thousand years ago - in fact, before the concept of a "page" was invented.

      3. This post has been deleted by its author

      4. Anonymous Coward
        Anonymous Coward

        Re: PDF has its uses I suppose

        > when I refer you to page 404 of the HTML status code manual, you get something totally different because

        Because you are not referring to a page: you are applying a physical metaphor in an inappropriate context and failing to comprehend the implications. That is an example of misuse.

      5. Michael Wojcik Silver badge

        Re: PDF has its uses I suppose

        So you want to do away with a standard, so when I refer you to page 404 of the HTML status code manual, you get something totally different because in your rendering of the manual the relevant material is on page 418 or even 1415...

        The vapidity of this example (there is no "HTML status code manual") aside, the problems with using page numbers for citation have been well known since long before there were computers. That's why, when we're using responsive-layout documents, we don't use page numbers to cite passages.

        This straw man was scattered to the winds long ago.

        1. Roland6 Silver badge

          Re: PDF has its uses I suppose

          >The vapidity of this example (there is no "HTML status code manual") aside, the problems with using page numbers for citation have been well known since long before there were computers.

          @Michael - I think you need to get out into an office or classroom and listen to real people and look at the books/source materials they are using, especially the "contents" page. Also take a look at the covers of various magazines: "Your guide to Cloud - see p18". I'm not talking about citation, although looking through various academic papers, I do note many in their references include page numbers, which can be helpful in confirming that the paper's author was referring to the "2nd edition - reprinted with corrections" and not the 2nd edition.

          BTW I know there is no "HTML status code manual", I chose it as computing reference books/tomes are the sorts of things I felt El reg readers could relate to. Also the page numbers were carefully chosen - lateral thinking needed for 1415 .. :)

    2. Steve the Cynic

      Re: PDF has its uses I suppose

      Even trying to edit a PDF is a series of unpleasant workarounds unless, I suppose, you bought a full copy of Acrobat.

      Thanks for reminding me that I need to migrate my copy of Acrobat 8 to my new PC. (The reasons that I have one are, today, entirely invalid, and, on reflection, probably were *always* entirely invalid, but they seemed like a good idea at the time.)

      Oh, and yes, I own a printer. And when it went so far EOL that I couldn't get ink cartridges any more, I bought different one to replace it. So I guess I own *two* printers, at least until I get around to taking the old one to the tip.

      1. Anonymous Coward
        Anonymous Coward

        Re: @Steve The Cynic

        So I guess I own *two* printers, at least until I get around to taking the old one to the tip.

        If it's so far beyond use, then the correct disposal procedure is take it apart for s**ts & giggles, then take a pile of parts to the tip

        1. J. Cook Silver badge

          Re: @Steve The Cynic

          Or, depending on the type of printer, taking it apart for all the nummy parts for making other things. (servos, steppers, smooth steel rods with linear bearings that are already fit for it, etc...)

          I know of a few 3d printers that were built using scavanged dot matrix printer parts...

          1. Anonymous Coward
            Anonymous Coward

            Re: @Steve The Cynic

            > I know of a few 3d printers that were built using scavanged dot matrix printer parts...

            Intentionally or in the process of trying to put the original thing back together?

    3. Doctor Syntax Silver badge

      Re: PDF has its uses I suppose

      "Even trying to edit a PDF is a series of unpleasant workarounds"

      Trying to unscrew a welded joint is also tricky and for the same reason: they're both intended to be unchangeable.

      1. Anonymous Coward
        Anonymous Coward

        Re: PDF has its uses I suppose

        > Trying to unscrew a welded joint is also tricky and for the same reason: they're both intended to be unchangeable.

        Nope. Again that is one of the many and most commons misconceptions about PDF.

        The output was not designed to be changeable. That is an altogether different kettle of fish than "intended to be unchangeable". You are mistaking a non-goal by a requirement.

    4. Anonymous Coward
      Anonymous Coward

      Re: PDF has its uses I suppose

      > as it happens the "Agencia Tributaria" (Spanish tax equivalent of HMRC) have done a thorough job of offering their users a PKI-based online tax system. Works well but its weakness is its dependence on PDFs and problems are guaranteed in those bits every time.

      Well, that's the Spaniards for you: take a good idea and implement it as poorly as possible. Have you had to deal with one of their electronic IDs?

  6. Steve Graham

    Flow

    "Now with people primarily reading on screens, (over 50% of eBooks on phones) and no standard screen size or resolution, like Letter and A4 on paper, layout needs to be "Responsive" and work with user selected rescaling (sharp vs poor eyesight)."

    Most of the HTML I see these days shows every sign of the "web designer" fighting to stop users' browsers from applying their own formatting to fit the device & screen.

    1. Steve Button Silver badge

      Re: Flow

      Aye, there's the rub.

      Do you want Jobs style beautiful exact placement on the "page" or be able to view things on many devices.

  7. Lee D Silver badge

    PDF is a WORM format, as far as I'm concerned.

    We use it in work to say "This is it, this is the document, this is how it looks, nobody change it" and then offer that to customers knowing it will look the same no matter what they open it on and it can't be tweaked. Yes, we know you *can* edit them, but you can't edit them easily or nicely or guaranteeably.

    Draft in Word, publish in PDF.

    It's a great format for that. This is the version, no changes. Sign it if you have to. Beyond that, it's really just another format.

    I refuse to buy Acrobat, though. I paid for Nitro once when it was cheap and that serves all my needs. For years (and still currently), I used PDFCreator and other freebie Ghostscript-based things to create PDFs if I needed them.

    I don't see that the format needs much extension.

    However, I was recently asked how to "stop people stealing our pictures out of PDFs" (and also website images). My solution was "don't put them in there" because you can't beat an analogue hole (screenshot tool) and PDFs you can suck the content out any time you like. They can't restrict "reading" permissions.

    The biggest problem with Adobe is all the plug-in shite that tries to put such limitations and other DRM on you. I have one that literally interferes with EVERY PDF you print by watermarking it, whether or not it was part of the purchased PDFs that had that DRM. We stopped buying that stuff, fortunately.

    Keep it to a display format. I mean, use the forms stuff if you have to but even that's a security risk (running Javascript and talking to outside websites, etc.). Anything more is really a nonsense and won't be used and will contribute to the long-term death of the format.

    PDFs are fine. I mean anything would be fine, but XPS. But Adobe can't be making much money out of them at all.

    1. Roland6 Silver badge

      >how to "stop people stealing our pictures out of PDFs" (and also website images).

      Well this goes back to considering the intended purpose. As PDF's are really intended to be printed, there is generally no reason to have pictures/images that are of a higher resolution than is necessary to print the page. Similarly with websites, do you really need a high resolution image when most people will be viewing the content on a 1366x768 laptop display or a 3~5-inch mobile phone display.

      Doesn't stop people stealing the pictures, just ensures the copy taken isn't take good.

      1. Doctor Syntax Silver badge

        "As PDF's are really intended to be printed, there is generally no reason to have pictures/images that are of a higher resolution than is necessary to print the page."

        If all you have is a printer, that's fair enough. I view PDFs on Okular and that has a zoom control.

        1. Roland6 Silver badge

          >I view PDFs on Okular and that has a zoom control.

          This caused me to do a little research...

          The image chosen to illustrate Okular's capabilities on Wikipedia made me smile. It nicely illustrates how narrow and limited many commenters experience is; you wouldn't use Word to write a musical score, however, PDF allows those without the relevant application to read your score.

          Which reminds me of other uses of PDF and history!

          We forget just how painful Word was before the rise of PDF; yes Word allowed you to do object embedding, only problem was anyone else wanting to read your Word document and view all those Visio diagrams, etc. that you had so carefully embedded, had to have the relevant applications installed on their system. Obviously, you could paste-as-picture/image but that made updating a pain. However, print as PDF and everyone could view your document as you intended - okay that might have made feedback harder, but generally many people simply printed off the document, annotated it by hand and handed it back.

          Similarly for MS Project plans, want to ensure everyone can read the current plan, just print to PDF (using a sensible page size).

          Interestingly, I've never received a document that used or a request for documents to be sent in, Microsoft's "PDF killer" XPS format. [Aside: I don't understand why MS haven't killed this off yet.]

          Picking up on archival and OCR comments, one aspect of PDF not commented upon is it's ability to contain document layers, so for imaging and workflow applications, PDF was an ideal format, paper could be scanned to TIFF, OCR'd and the two files combined into a single PDF file. This meant that you could search on the OCR'd text and if it didn't read well (ie. it contained scan errors) you could view the original TIFF image.

          1. Ken Hagan Gold badge

            "We forget just how painful Word was before the rise of PDF; yes Word allowed you to do object embedding, only problem was anyone else wanting to read your Word document and view all those Visio diagrams, etc. that you had so carefully embedded, had to have the relevant applications installed on their system. Obviously, you could paste-as-picture/image but that made updating a pain."

            Interesting. The OLE rules actually said that embedded objects had to offer a rendering that did not require the relevant application and that containers had to save that rendering as part of the containing document, precisely to avoid the problem you've just described. I would say that it was almost impossible to actually program either the server or the container application without being aware of this, so it is interesting that you found yourself using a version of Office or Visio that had managed to screw this up.

            (Update: It's probably also worth mentioning that the OLE libraries provide this capability for free. It requires conscious effort on the part of the programmer to avoid caching a graphic rendering. So that's someone going the extra mile to tick the box labelled "be annoying".)

            1. Roland6 Silver badge

              The OLE rules actually said that embedded objects had to offer a rendering that did not require the relevant application

              Agree, however, from memory, the 'rendering' often didn't look exactly like the source, hence it was easier and made for a smaller Word document, to do the paste-as-picture. Also given the performance of the systems back in the 90's and early 2000's and the ease with which you could overwhelm Word, it was often easier/quicker to update the original object in the source application and just paste the results into the Word document.

              The other benefit of this approach was that reviewers/contributors had to tell you about corrections they wanted to make to such objects...

          2. Michael Wojcik Silver badge

            It nicely illustrates how narrow and limited many commenters experience is; you wouldn't use Word to write a musical score, however, PDF allows those without the relevant application to read your score.

            1. Terrible thing X is useless for application A.

            2. Sometimes-useful thing Y is useful for application A', which is related to but distinct from A.

            3. Therefore people who do not believe Y is wonderful have limited experience.

            I think your syllogism needs work. Or, preferably, nuking from orbit. Care to try again?

            1. Roland6 Silver badge

              @Michael "I think your syllogism needs work. Or, preferably, nuking from orbit. Care to try again?"

              I think you and others understood perfectly what I was saying and implying; PDF is a universal display format for 'printed' material; in some respects it is digital paper; if whatever you wish to communicate can be represented on a piece of paper then it can be held in a PDF file. Provided the reader has a PDF reader then they can access the digital paper and view what you wrote on it.

              Whilst I agree many PDF files do have some limitations which make reading simple text less than satisfying on small screen smartphones and tablets, I suggest this limitation is more a limitation of peoples usage and the tools available. For example, I've not looked at it, but given a PDF can have multiple layers, there is no reason why a viewer couldn't pull the text layer and display that instead of the printed page.

              I suspect also if someone really wants the benefits of ePub (or another dynamic display format) whilst not also losing the benefits of PDF then perhaps the way forward is to Standardise ePub, promote an ePub printer that can be plugged in and used just like PDF printers and get the PDF Standard updated to include an ePub layer!

        2. Anonymous Coward
          Anonymous Coward

          > I view PDFs on Okular

          That's an automatic thumbs up.

      2. Anonymous Coward
        Anonymous Coward

        > As PDF's are really intended to be printed, there is generally no reason to have pictures/images that are of a higher resolution than is necessary to print the page.

        You don't know much about printing, do you?

    2. Doctor Syntax Silver badge

      Draft in Word LibreOffice, publish in export as PDF.

    3. arctic_haze

      PDF is also great for presentations

      I produce my PDF with LibreOffice. It is very efficient. A presentation produced by Impress (the LibreOffice counterpart of PowerPoint) decreases its file size 2-4 fold when exported to PDF with 90% quality which is good enough even for a big screen. Plus the slides look as intended on every computer.

      The downside you do not have animation (with the exception of slide transitions). But most animations in presentations are a distraction anyway.

      1. Anonymous Coward
        Anonymous Coward

        Re: PDF is also great for presentations

        > I produce my PDF with LibreOffice. It is very efficient.

        More importantly, it treats the output as a file not as a printer which obviously it isn't. This means that you can easily and conveniently control such things as the metadata, window title, table of contents, and of course, hyperlinks.

        Few things make you look as incompetent as producing a PDF that goes like "and for this look on page 735¾" and not hyperlinking the stupid page reference (or not producing a table of contents, if appropriate).

    4. Anonymous Coward
      Anonymous Coward

      > I was recently asked how to "stop people stealing our pictures out of PDFs" (and also website images). My solution was "don't put them in there" because you can't beat an analogue hole

      And I had this conversation with a friend. My solution was: use a free-culture licence so people are not "stealing" them.

      For some unknown reason they actually did that and they realised that 1. the problem was not as big as they thought, it was just amplified by their anxiety and 2. their brand visibility in search rankings shot right up, perhaps as people felt more comfortable linking to them rather than lifting the content.

  8. Primus Secundus Tertius

    PDF bloat

    I am proof-reading a draft of a magazine for a small charity. Its Editor uses a DTP system that turns 40 A5 pages into a 8 MByte file. Acrobat 9 (vintage 2009) reduces that to 321Kbytes with no obvious loss of visual appearance.

  9. Androgynous Cupboard Silver badge

    Ahem

    PDF/X, the X was "X-change" I believe.

    As for versions - it's even worse than you made out:

    * Acrobat 8 was PDF 1.7, aka ISO32000-1:2008

    * Acrobat 9 was PDF 1.7 extension level 3 (there is no PDF 1.8)

    * Acrobat X was PDF 1.7 extension level 8 (unpublished; extension level 5 was published, but as far as we know there was never an extension level 4, 6 or 7)

    * Acrobat XI was... actually I never really figured that one out either. But we got EC sigs, which is nice.

    Then you've got Acrobat DC 2015, Acrobat DC 2017, Acrobat DC 2018. Next year it will probably be Acrobat DC 1880 just to keep us guessing, or perhaps Acrobat DC 77πᵉ. With an entirely new user interface of course, with all the buttons rotating in a constant spiral around the center of the screen this time, because change!

    Fortunately the file format means PDF is largely backwards compatible so you can largely forget about version numbers.

    1. Hi Wreck

      Re: Ahem

      Everyone knows “X” sells!

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like