back to article 'First ever' SHA-1 hash collision calculated. All it took were five clever brains... and 6,610 years of processor time

Google researchers and academics have today demonstrated it is possible – following years of number crunching – to produce two different documents that have the same SHA-1 hash signature. This proves what we've long suspected: that SHA-1 is weak and can't be trusted. This is bad news because the SHA-1 hashing algorithm is used …

Page:

  1. Tom 7

    9,223,372,036,854,775,808 sha1 calculations

    to get a collision. Its not ideal but I'd hardly call it a security hole. If they could publish the two documents I'll try not to use them as passwords.

    1. Bruce Hoult

      Re: 9,223,372,036,854,775,808 sha1 calculations

      They have published the two documents, I've downloaded them and confirmed their claims.

      They are 422435 byte PDFs, differ in 62 of those bytes, have the same sha1 hash, and show different contents (the background colour). I haven't checked if there are any other differences in the display.

      1. Preston Munchensonton

        Re: 9,223,372,036,854,775,808 sha1 calculations

        They are 422435 byte PDFs, differ in 62 of those bytes, have the same sha1 hash, and show different contents (the background colour).

        Still not much of an attack, IMO. Unless someone can arbitrarily set the difference to something meaningful, it only proves that its possible to overcome SHA-1. It's still really really REALLY improbable to produce a meaningful difference.

        1. Robert Helpmann??
          Childcatcher

          Re: 9,223,372,036,854,775,808 sha1 calculations

          It's still really really REALLY improbable to produce a meaningful difference.

          Remember the bit in the article about this getting easier over time? That means that SHA-1 is broken and will only get more so until it is trivial to roll it like an old carpet. This attack is just the tip... not what you want to hear if your security depends on this algorithm.

        2. Chris Miller

          Re: 9,223,372,036,854,775,808 sha1 calculations

          You need to read up some of the discussions that took place when similar attacks were first identified for MD5 hashes. This is a serious problem - I could get you to digitally sign a contract; and if I can produce a similar document with the word 'not' inserted (say) and the same SHA-1 hash, you're in trouble. Similarly using SHA-1 to authenticate a valid web server and then make a slight change to the URL, but keep the hash the same ...

          Right now it needs a massive amount of computing power to defeat SHA-1, but the NSA can probably do it already. And these attacks only ever get quicker - not just by Moore's Law, but because someone clever will read the paper and think of a way of doing it 10x faster.

          1. Christoph

            Re: 9,223,372,036,854,775,808 sha1 calculations

            It depends whether you can make a particular required change and still produce a hash collision.

            Yes, it's a problem if you can insert 'not'. It's less of a problem if you can only produce the collision if you insert 'KJ"BIUE_D H£(*ERNY£'

            1. Charles 9

              Re: 9,223,372,036,854,775,808 sha1 calculations

              But if you can insert "not" AND just stash away the "KJ"BIUE_D H£(*ERNY£" in a garbage area, you're sorted.

              What I want to know if this is more than a collision but a preimage attack (or more severely, a SECOND-preimage attack that found a collision with a specific target).

              1. Preston Munchensonton

                Re: 9,223,372,036,854,775,808 sha1 calculations

                But if you can insert "not" AND just stash away the "KJ"BIUE_D H£(*ERNY£" in a garbage area, you're sorted.

                While true, the likelihood of doing this AND getting a collision is highly improbable. Yes, it's only a matter of time, as many have pointed out. But then again, our solar system only has a matter of time before the Sun engulfs it. What's important isn't what is eventual. It's the time scale on which it will happen. For now, that time scale is still very far out for turning this finding into something meaningful. I doubt it happens in my lifetime, even with the looming presence of quantum computing around the bend.

                1. theblackhand

                  Re: 9,223,372,036,854,775,808 sha1 calculations

                  Re: "While true, the likelihood of doing this AND getting a collision is highly improbable"

                  Aslong as you can pad the document in addition to making required changes AND making a change results in a financial advantage for you of more thanUS$130,000, I would be reluctant to call this highly improbable.

                  Based on previous hashes, the discovery of collisions has lead to more weaknesses being found, and usually large collision spaces within a hash function that need to be avoided.. What costs ~US$130,000 today will likely cost less than US$10,000 within 5 year.

                  Does it mean we have to throw away existing SHA-1 hashes? Probably not unless the financial incentive to attack them exists.

                  Does it mean we need to start patching SHA-1 hash functions to address discovered weaknesses deploying SHA-2 and later hash functions now for verifying important documents or software versions? Definitely - mainly because if we don't it just doesn't happen * stares at MD5 hashes *

                  Someone mentioned financial websites - all current browsers already require SHA-2 hash functions on certificates since January 2017. The only real exception I am aware of is old and cruddy Java 7 (or earlier) apps that refuse to upgrade or to die.

              2. CAPS LOCK

                "just stash away the "KJ"BIUE_D H£(*ERNY£" in a garbage area, you're sorted."

                No, you now have a document of a DIFFERENT SIZE. The SHA1 AND the size have to be identical.

                1. Adam 1

                  Re: "just stash away the "KJ"BIUE_D H£(*ERNY£" in a garbage area, you're sorted."

                  Why does the size have to be identical? As long as the artifact is a believable size you can launch an attack.

                  Imagine an executable file download through some sort of update mechanism that uses sha1 to validate the binary before executing it. No-one will notice that the 64MB upgrade.exe is 25KB larger. But now the attacker has replaced the intended payload with their own. It would be interesting to know about the collision algorithm. Like does it require the two files to be nearly identical? Does a large file take longer to generate a collision?

                  1. CAPS LOCK

                    "Why does the size have to be identical? "

                    In the article the pdfs are the same size. If you are checking for tampering, checking the file size is a lot easier than checking the SHA1.

                    1. Alan W. Rateliff, II
                      Paris Hilton

                      Re: "Why does the size have to be identical? "

                      Just a few thoughts.

                      PDFs are compressed containers, right? That being the case, you can change quite a lot of the payload and all you have to do is ensure the compressed output of your manipulated file has the same size and hash. Being that the input to the hash generator is essentially already manipulated in a manner which essentially obfuscates the source document.

                      Of course, that is a narrow application but still seems practical. Going a similar route, look at any compressed file you might download: .zip, .7z, .tgz, etc. For full confidence a check for all parts should be applied, not just the download but the individual hashes of each and every part. For instance, the hash of the archive, the hash of all parts combined, the hash of each individual file, hunk, etc. Doing so gets heavy and resource expensive rather quickly.

                      More broadly, think of the Open Document formats which are zipped containers, among others.

                      But what of TLS sessions? Consider that a spook or nefarious agency (arguably one in the same) has a packet capture of a TLS-encrypted session signed with a SHA-1 certificate. We already know sessions signed by a certificate generated with poor entropy (the debacle from a few years back) can be undone. $130k is nothing to throw at this for such agencies, maybe even enough to get the GPU calculation requirements down to something more reasonable than a year (how much was paid for the San Bernadino iPhone hack?) Whatever was in that session is at risk, be it an email, web search, forum posting, or penis-enhancement purchase confirmation page.

                      1. Cynic_999

                        Re: "Why does the size have to be identical? "

                        "

                        PDFs are compressed containers, right?

                        "

                        No. They are the very opposite of "compressed". Open a PDF in a text editor and you will see.

                    2. dajames

                      Re: "Why does the size have to be identical? "

                      If you are checking for tampering, checking the file size is a lot easier than checking the SHA1.

                      That assumes that you know what the size should be. If you have two documents with the same apparently correct signature you know that one is a forgery, but you have no idea which.

                      Checking the size does not tell you that -- unless you have some out-of-band mechanism for obtaining the expected size of the genuine document ... and how do you check the veracity of the out-of-band data? Well, it'll be signed, using a hash ...

                      1. tom dial Silver badge

                        Re: "Why does the size have to be identical? "

                        To verify the hash meaningfully, no matter what kind is used, you need to know what the output of your hash computation should be. That is no different from knowing what the size should be. That you might need to get that out-of-band may not be a major objection.

                        1. Graham Cobb Silver badge

                          Re: "Why does the size have to be identical? "

                          @tomdial: I don't think you understand how signing works.

                          1. tom dial Silver badge

                            Re: "Why does the size have to be identical? "

                            I think I do, and I also think the article was primarily about sha-1 sums of data, not the signing of the sha-1 sums.

                        2. dajames

                          Re: "Why does the size have to be identical? "

                          To verify the hash meaningfully, no matter what kind is used, you need to know what the output of your hash computation should be.

                          We're talking about signed PDFs, here. You know exactly what the hash should be, because it is part of a signature block that's been signed with the document author's private key.

                          [The article is about weakness in the hash process, not about any shortcomings of PKI, so the validity of the hash in the signature block can be taken as a given.]

                          That is no different from knowing what the size should be.

                          It's very different, because the size of the document is not part of the signature block.

              3. Anonymous Coward
                Anonymous Coward

                Re: 9,223,372,036,854,775,808 sha1 calculations

                > What I want to know if this is more than a collision but a preimage attack (or more severely, a SECOND-preimage attack that found a collision with a specific target).

                One presumes so.

                If they were just going for birthday collisions, I would think that the chances of getting two random documents which are both valid PDFs with readable text in them is rather low.

                OTOH, the chances of it being a valid Perl script are quite high :-)

            2. Hans 1
              Boffin

              Re: 9,223,372,036,854,775,808 sha1 calculations

              >Yes, it's a problem if you can insert 'not'. It's less of a problem if you can only produce the collision if you insert 'KJ"BIUE_D H£(*ERNY£'

              Hey, you know that, when it comes to PDF, you could place 'KJ"BIUE_D H£(*ERNY£' behind the image or something ...

              Remember, this is bytes, so, you want the hash of 000011100010001xxxxxxx0001000111001100[...] to equal the hash of 00001111101000100100010001000111001100[...], where x can be anything because located behind an image or font color is white or same as background etc ... you name it. This has now been proven to be feasible, expensive, 130k or such, but feasible ... and it will only get cheaper ... in 10 years time, you and I might be able to do it over a weekend for the lulz, better hw, better algo to compute this ... ;-)

              You could also say the random bits you want are for an image with total transparency ... you know, the sky's the limit.

              1. MJB7

                Re: 9,223,372,036,854,775,808 sha1 calculations

                @Hans 1 You are almost right *but* what they can do is get

                000011100010001xxxxxxx0001000111001100[...] to equal the hash of 000011111010001yyyyyyy0001000111001100[...], where x and y can be anything.

                This is the difference between a collision (which we now have), and a second pre-image attack (which we don't have - yet).

              2. Tony Haines

                Re: 9,223,372,036,854,775,808 sha1 calculations

                >"But if you can insert "not" AND just stash away the "KJ"BIUE_D H£(*ERNY£" in a garbage area, you're sorted."

                > "Hey, you know that, when it comes to PDF, you could place 'KJ"BIUE_D H£(*ERNY£' behind the image or something ..."

                It seems to me this is an argument for deprecating the pdf format (and anything else which has undefined buffers or hidden data) for contracts and anything else which needs to be digitally signed, to be honest.

                If the document was in a form where you couldn't tweak it by tinkering with effectively 'wild' bits, it would be enormously harder to produce a document of the same size.

                Larger documents would still be almost as easy, of course, so you'd need to add an exact filesize to the hash.

          2. Ian Michael Gumby

            @Chris Miller Re: 9,223,372,036,854,775,808 sha1 calculations

            No, sorry, its not that easy.

            For a very long time, its been known that there's a potential for a SHA-1 collision to occur. Its was just that no one found it.

            While a random collision can occur, creating a specific modified document and then to figure out how to create a SHA-1 hash equivalent that is also the same size? Not that easy,

        3. Charles 9

          Re: 9,223,372,036,854,775,808 sha1 calculations

          "Still not much of an attack, IMO. Unless someone can arbitrarily set the difference to something meaningful, it only proves that its possible to overcome SHA-1. It's still really really REALLY improbable to produce a meaningful difference."

          Sometimes, just a few characters can be enough to change the whole meaning of a document, such as the inclusion or exclusion of a single "not".

          1. Anonymous Coward
            Anonymous Coward

            Re: 9,223,372,036,854,775,808 sha1 calculations

            "Sometimes, just a few characters can be enough to change the whole meaning of a document, such as the inclusion or exclusion of a single "not"."

            But again, they have to be meaningful characters you've changed. You have to include/exclude that "not" and *then* run through 9 quintillion variations of meaningless padding at the end of the document and run the sha1 calculations until you get a collision.

            Not saying it's impossible, we've always known that. It's just becoming less-difficult. Which is why we're already migrating to SHA-256.

          2. JMcL

            Re: 9,223,372,036,854,775,808 sha1 calculations

            ...or insertion of 000 for that matter!

          3. Cynic_999

            Re: 9,223,372,036,854,775,808 sha1 calculations

            "

            Sometimes, just a few characters can be enough to change the whole meaning of a document, such as the inclusion or exclusion of a single "not".

            "

            The point is that after inserting the "not" you will have to change a large amount of data in addition in order to achieve the same hash. In a text document those changes will not make any sense, and so the change would be exposed. This can only work on data that can have many changes applied without being noticed - e.g. a BMP or JPG image where lots of bytes can be changed but still leave a recognisable image.

            In a certificate or other small or well-structured file with few bytes that could be changed without invalidating the format, it will not be possible to create a viable but altered file with the same hash.

        4. dajames

          Re: 9,223,372,036,854,775,808 sha1 calculations

          It's still really really REALLY improbable to produce a meaningful difference.

          Not really any more improbable than the demonstration ... if you can make changes to the background colour (say) until you get the hash you want what's to stop you changing the content to whatever you want to say and then changing the background colour so that the hash matches that of the original document? The difficulty must be about the same as the demonstration.

          What I'd like to see is an estimate of the cost of those 9,223,372,036,854,775,808 sha1 calculations and umpty-bazillion hours of GPU time. How much value does a document have to have for it to be worth doctoring it in this way?

          There's nothing much new here apart from the fact that some folk have demonstrated something that was already known to be theoretically possible. We all know that SHA-1 is deprecated and that SHA-2 and SHA-3 are more secure. It's rather sad if it tales an actual demonstration to prove that the theoretical risk is real.

          1. oxfordmale78

            Re: 9,223,372,036,854,775,808 sha1 calculations

            It costs nothing if you use some else's hardware...think botnets.

        5. Anonymous Coward
          Anonymous Coward

          Re: 9,223,372,036,854,775,808 sha1 calculations

          > It's still really really REALLY improbable to produce a meaningful difference.

          No it's not. You make a meaningful difference, and put some hidden padding somewhere to make the SHA1 match the desired value.

          This is the same process as when MD5 was broken - in that case they used it to create a forged SSL certificate. It had a "tumor" in it to fudge the MD5 hash, but was still a valid certificate.

          https://www.win.tue.nl/hashclash/rogue-ca/

          In short, it's is now practical to forge a certificate with an SHA-1 hash. You should be worried - efforts to disable SHA-1 in browsers have not come early enough.

      2. Bruce Hoult

        Re: 9,223,372,036,854,775,808 sha1 calculations

        There are a total of 150 bits different, by the way.

      3. Anonymous Coward
        Anonymous Coward

        Re: 9,223,372,036,854,775,808 sha1 calculations

        But, did they change the account number and sort code of an account to receive gizillions, rather than some random shite. As for certs, your OS already trusts many govt controlled CA, so that claim is pointless.

        Anyone care to share a financial institution with a sha-1 cert?

    2. Mage Silver badge
      Facepalm

      Re: 9,223,372,036,854,775,808 sha1 calculations

      The SHA-1 was already depreciated because it was already known it would be possible soon.

      It will only get easier.

      That's why SHA-256 exists already.

      A special offer weekend deal on AWS just when someone wants to do this?

    3. oxfordmale78

      Re: 9,223,372,036,854,775,808 sha1 calculations

      If someone has botnet of 150,000 nodes, it just takes a couple of days to find a SHA1 collision.

  2. Snowy Silver badge
    FAIL

    What would have been truly clever if the documents had differed by more than 0.0147%, all that work and to make it work they change next to nothing.

    Edit beaten to it by Preston Munchensonton, better said than me have an upvote.

    1. Chris Miller

      The point of a hash function is that even if you change just a single bit, it should give a completely different output. There are far more possible documents than there are hash function outputs, so collisions will always exist - but it should be computationally impossible to find them. This attack proves that it no longer is.

      1. Deltics

        No hash function makes finding collisions computationally /impossible/, the best you can hope for is /practically useless/.

        As in, too impractical to be of any utility to a miscreant.

        As others have said, the point is not the ability to contrive a collision but to establish a means to change a document or file in a way that has any practical result (changing the terms of a contract, introducing functional malware or a backdoor) without changing the byte count in the file (nobody relies on JUST the has, I hope) and still contrive to create a collision.

        AND the result of the change also has to remain undetectable when subject to an eyeball test. i.e. It's no good being able to change the terms of a contract if to achieve that whilst preserving the file size and the hash you have also introduced what is obvious junk content as well.

        Changing the background color of a PDF ? I'm not sure what the possible malicious application of that could be. This doesn't prove that an effective attack is possible, only that a pointless attack is. Surely ?

        I can drive my car into a concrete wall at 100mph and effectively demonstrate that the collision protection offered by my vehicle is inadequate to protect the occupants in the event of such a collision. But that doesn't mean that the collision protection in the vehicle is inadequate.

        1. Trigonoceps occipitalis

          PDF BACKGROUND

          Is it possible that the change of background colour is a part of a suite of changes that hide the addition of "not"?

          1. Chris Miller

            Re: PDF BACKGROUND

            I think* that PDFs (and Word documents and almost any other document format except plain text) have a sufficient number of 'filler' characters that can be set to any value you like without significantly changing the document format. This gives scope for a clever algorithm to produce the necessary collisions.

            *I haven't read the paper, but this is how it worked for MD5

          2. Tom 7

            Re: PDF BACKGROUND

            I have produced PDF documents that produce different output when printed on different printers. And that's without 'low ink', It should be irrelevent to this discussion but if you want a PDF document to say what it should print it and check it yourself.

        2. Hollerithevo

          A date, for instance?

          A delivery date in a contract? A price?

        3. Kiwi
          Boffin

          AND the result of the change also has to remain undetectable when subject to an eyeball test. i.e. It's no good being able to change the terms of a contract if to achieve that whilst preserving the file size and the hash you have also introduced what is obvious junk content as well.

          Take a 1,000 or so word document. One you know well, and as you're in HR have read a half a dozen times today alone with new hires coming into the company. Spot the one small change in the dry legalese that is numbing to the mind of even the most dedicated HR drone.

          A decent text-comparison program should pick it up. The human eyeball probably never will unless you spot something else out, eg a paragraph that should have one word on the last line has 2. A drop of a character elsewhere could fix that.

          Remember that research has shown that we respond more to the shape of words than the actual letters used in the words, and we can speed-read badly spelt documents without even noticing the errors. A human eye will not spot a small change in a large document or bit of source code.

      2. Dan 55 Silver badge

        Well all hashes are going to fail sometime as you convert a really long string of bits into a shorter string of bits.

      3. Adam 1

        > There are far more possible documents than there are hash function outputs

        Known as the Pigeonhole principle. Where the size of input exceeds the size of the hash output, there not only "can be" but "must be" collisions. To those harping on about file size differences, Windows explorer rounds to the closest KB unless you specifically check the details of the properties. For most use cases, having a slightly different file size is unimportant, but it is impressive nonetheless that they could locate a collision even constraining themselves to the valid file format and same file size.

        Designing effective hash functions is really hard. I had to last year and stuffed it big time. It wasn't a cryptographic hash but one that would see hundreds of thousands of our objects get hashed into different buckets for faster dictionary lookups. So basically you had an infinite combination of inputs that had to hash to 32 bits (about 4 billion). I managed to create an accidental swarm to zero which meant that whilst there was good distribution generally, a substantial proportion of real world objects would end up in bucket 0. After fixing it, the worst I saw was in the 3 objects per bucket out of a million range.

        1. Ken Moorhouse Silver badge

          Pigeonhole Principle

          +1 for naming the problem.

          Publishing different hash signatures for the same document, and/or partitioning the document and using the same hash method to "cover" overlapping portions of it* have got to be the only way of minimising the effect of the Pigeonhole Principle without moving to a different scheme which has a higher number of pigeonholes.

          Using this multiplicity technique effectively increases the number of pigeonholes used to "describe" a given mapping, and has a similar effect to triangulating measurements for a map using a theodolite (i.e., reduces the chance of error in positioning).

          With multiple hashes, let's be clear on this: Every single one has to match before provenance can be established. So long as the statistics show a suitably low collision probability for the number of hashes utilised I don't think it matters how secure the underlying crypto used to generate each of those hashes is. However, using different "salts" on the same method is probably inadvisable.

          *Commentators here have mentioned background colouring and formatting obfuscation techniques to render an identical hash: Why not "layer" the hashing mechanism in such a way that the document is one layer and the presentation is a separate layer? This will mean each layer bearing its own hash. Only when the two hashes are identical are we looking at the authentic document. Of course, this method of hash generation would have to be put into the public domain for it to be useful.

          1. Adam 1

            Re: Pigeonhole Principle

            > or partitioning the document and using the same hash method to "cover" overlapping portions of it

            Just be aware that whilst the pigeon hole principle shows that it is mathematically certain that two documents that collide must exist where the input size exceeds the hash output size, it does not follow that two inputs that are smaller than the hash output cannot collide. In a well designed algorithm, it should be both very rare and computationally infeasible to find the other. IE. Nothing short of brute force

            1. Charles 9

              Re: Pigeonhole Principle

              Maybe, but because computer technology continues to improve, brute force gets easier and easier. Imagine if you have a Mirai-class botnet at your disposal and you set them to the task of trying to perform a second-preimage attack.

      4. This post has been deleted by its author

      5. DaLo

        Not just computationally impossible to find them, which may be a big ask as you could always brute force with enough computing power.

        It is key that you cannot premeditatedly change a document and manipulate it to produce the same hash which is why this shows SHA1 as broken.

        You may just find, unlikely but possible, that your picture of your cat creates the same hash as your contract for your house. However no one would expect that your solicitor got you to buy a house based on a cat picture.

        If however you can work out that by changing a little bit of text you can then add extra text/bytes/graphics and work out what they should be to force a collision then it is broken.

        If the only other hashes that exist for a document are all random garbage or completely unconnected to the original document as well as being impossible with current resources to premeditatedly work out then it isn't broken.

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like