back to article Git fscked by SHA-1 collision? Not so fast, says Linus Torvalds

About that SHA-1 collision: Linus Torvalds has taken to Google+ to emphasise that in Git, its main role is error detection, so “the sky isn't falling.” The weak hashing algorithm is used, among other things, to provide a digital signature for software, documents like PDFs, and encryption certificates. The mathematical …

  1. Electron Shepherd
    Facepalm

    That's not how hashes work

    "The mathematical operation should produce a unique result for any given input"

    That's not how hashes work. There's lots of inputs that will all produce the same hash, and producing a hash from an input is, relatively speaking, computationally trivial. The tricky bit is to, for a given hash, find an input that will generate that hash and that is meaningful in the context of the original input (for example, as Google did producing a second valid PDF). That is computationally very difficult. That's why they are often called "one-way" functions.

    1. P. Lee

      Re: That's not how hashes work

      The issue is whether you can compromise the client before they realise the code is not what they think.

      If you can compromise the GIT client and then pull down the proper code in the background so you don't know you've been compromised, you have a problem.

      This is one reason its good to avoid monoculture. It would be hard to compromise lots of different clients (while maintaining the hash) and you only need one client to notice the compromise and the user to alert the others.

    2. gnasher729 Silver badge

      Re: That's not how hashes work

      Electron, what you post is nonsense. A 160 bit hash _does_ in practice produce a unique result for any given input (unless you spend 6,600 years of CPU time to search for two given inputs with the same result).

      Someone did a calculation for backup systems that calculate hash codes to see if an identical file is already stored. It is _possible_ that you have a different file with the same hash as one that is already stored, and your file isn't going to be backed up. It is also possible that 5 seconds after the backup, a small meteorite crashes into your office and destroys your computer, and a bigger one destroys the data centre where your backup is held. The latter scenario has higher probability.

      1. Paul Smith

        Re: That's not how hashes work

        Sorry, but no it doesn't! Any hash key can produce only a limited number of distinct results, and while the greater the number of bits in use, the less chance of *accidentally* encountering a collision. To go from "the chances of a random collision are vanishingly small" to "it always produces a unique result" is the sort of dangerous mistake that hackers love to exploit.

        Linus was using SHA-1 as a cheap way of calculating a hash that was *very unlikely* to collide, the hackers are using a known algorithm to produce a predetermined result.

        Imaging Linus was simply summing the bits mod 1024. there would a 1 in 1024 chance of a collision. If a hackers target has a hash of 512 and the code they want to use has a hash of 384, then they just have to add 128 to produce a valid fake.

      2. Doctor Syntax Silver badge

        Re: That's not how hashes work

        "A 160 bit hash _does_ in practice produce a unique result for any given input (unless you spend 6,600 years of CPU time to search for two given inputs with the same result)."

        AIUI there is now a method of constructing a colliding pair of files with rather less than the 6,600 years you suggest. That's what's set off this whole discussion.

        I think one of Linus' points is that constructing a file which gives the same hash as an existing file and having it compilable is problem of a very different order of magnitude.

        1. TeeCee Gold badge
          Facepalm

          Re: That's not how hashes work

          That's 6,600 years of CPU time.

          If you have 6,600 CPUs, that's a year. If you're Google and have the equivalent of a medium sized country full of CPUs....

        2. Tom Melly

          Re: That's not how hashes work

          Well, this is what I wonder about. Say you modify a genuine function in the source to do something nasty - surely all you have to do is bury a nice big comment somewhere else in the source with the appropriate values inside it to create a matching hash? That's for source rather than the compiled exe of course.

          I dunno - it sounds like Linus owes Gilmore an apology, irrespective of how exploitable the flaw is. I get what Linus is saying, but it reads like self-justification rather than a best practice.

          1. grim0013

            Re: That's not how hashes work

            The approach you describe (hiding random data in a comment) would be much more difficult to hide from Git than 'merely' generating a collision of the sort being discussed. Git doesn't just hash the content--it prepends a header that includes the size of the object in question. This means that the 'doctored' content must either be the exact same size as the original, or the collision must also take the header into account. Accomplishing either of these things is significantly more difficult than simply generating a collision. Not impossible, but not--for the moment--a credible threat. Really it just means that they should move to a larger hash some time within the next few years or so.

            Oh yeah.. It also just occurred to me that this type of attack also assumes that nobody is going to bother looking at the commit diff for some reason. "Some random person just made a +1329 / -7 commit to the kernel, you say? About time those bums did something substantial. I'm sure it's fine."

            On a side note, I've only recently started digging in to how Git works under the covers, and I hadn't quite worked out what the reason was for including size information when hashing 'blob' objects. Well, now I know.

      3. badger31

        Re: That's not how hashes work

        "The mathematical operation should produce a unique result for any given input"

        Given that "any given input" is conceptually an infinite set, always expecting a unique 160bit hash is just ludicrous. The main thing with these hashes is that small changes in the code will give hugely differing hashes, making collisions rare. We all know that it is possible to engineer (by brute force) a collision, but it's a hard problem. Doing so in such a way as to be undetectable in Git is an even harder problem. Way harder. I look forward to seeing the attempts, though :-)

      4. juice

        Re: That's not how hashes work

        "Electron, what you post is nonsense. A 160 bit hash _does_ in practice produce a unique result for any given input (unless you spend 6,600 years of CPU time to search for two given inputs with the same result)."

        160 bits make for a big number. A really big number, with lots and lots of zeros after it. And any halfway decent hash algorithm worth it's salt [*] will be designed such that even a tiny change in the input will produce a significantly different output.

        However, 160 bits is not infinity. It's not even a googleplex. And if someone's deliberately trying to force collisions, you'd be foolish indeed to assume that things are safe.

        Then too, assuming they're using the "1 GFLOP machine working for a year" definition of CPU-years [**], that figure of 6,600 years isn't as impressive as it sounds.

        An Intel i7 can run at over 350 GFLOPs in dual-precision mode, while a modern GPU (e.g. Radeon RX 480) can theoretically churn out up to 5 TFLOPs in single-precision mode; the Tesla K80 used by Amazon for their cloud-computing back-end can churn out 8.74 TFLOPS in single-precision mode.

        Then too, there's always the possibility of using distributed computing or specialised hardware - this type of problem is inherently parallelisable and bitcoin mining has shown how effective ASICs can be for this type of number crunching. Also, since the people most likely to want to force a collision are nation-states or hackers, they could well have access to a supercomputer or maybe even botnets - I wouldn't be surprised to see people offering collision-detection as a service, as more traditional profit streams continue to dry up (albeit with botnets increasingly being based on IoT low-power devices, there may not be many of these).

        So, yeah. 160 bits is good. But these days, it arguably ain't good enough.

        [*] sorrynotsorry

        [**] http://www.gridrepublic.org/joomla/index.php?option=com_mambowiki&Itemid=35&compars=help_documentation&page=GFlops,_G-hours,_and_CPU_hours

      5. Anonymous Coward
        Anonymous Coward

        Re: That's not how hashes work

        "A 160 bit hash _does_ in practice produce a unique result for any given input "

        Wtf does this ridiculous sentence even mean?

        A 160 bit hash is 20 bytes, isn't it?

        So if one assumes that the source data being hashed is bigger than 20 bytes, it doesn't matter what hash algorithm is in use, there must by definition be more than one set of input data which will generate particular hash values. Otherwise e.g. a disk would store the hash value not the input data. Same logic applies to any sensible hash size.

        "Someone did a calculation for backup systems that calculate hash codes to see if an identical file is already stored. It is _possible_ that you have a different file with the same hash as one that is already stored, and your file isn't going to be backed up"

        Fancy that. What are the odds that the "someone" in this picture was a disk deduplication vendor? And/or that the logic in question actually was "if the hash of the new data and the already-stored data are different then we know that the old and new data ARE different, and therefore we must store the new data. IF (and only if) the hashes match, we must THEN compare the actual old data and new data, rather than the old and new hashes, and do some other extra work to ensure that the new data CAN be stored and recovered safely."

        1. Anonymous C0ward

          Re: That's not how hashes work

          All hashes, by definition, are vulnerable to collisions and infinitely many of them. https://en.m.wikipedia.org/wiki/Pigeonhole_principle

    3. Anonymous Coward
      Anonymous Coward

      Re: That's not how hashes work

      "The tricky bit is to, for a given hash, find an input that will generate that hash and that is meaningful in the context of the original input (for example, as Google did producing a second valid PDF). That is computationally very difficult."

      For difficult read impossible if you don't know the size of the original hashed data. It could be smaller than the hash or it could be gigabytes in size and you'd have to try every permutation of every size which would take way longer than the age of the universe even with a server farm at your disposal. Obviously if the hashing algorithm has a flaw which means it doesn't produce all possible hashes or produces identical hashes more often than it should and/or you have a rough idea of the oriiginal data size this narrows it down but even so, its far from easy.

  2. 2460 Something
    Thumb Up

    Common sense approach

    Nice 'Calm down dear' from Linus. I enjoy a good 'risk of security compromise' as much as the next sys admin (read: not at all) but these need to be taken in context with the actual risk of implementation and the scope of what can be achieved. As he states, git is not using this to authoritatively state that the data is from said user, merely that it didn't get corrupted in transfer. You already know which source you are connecting to as this is handled by the https certificate.

    Very refreshing to have a proper, reasoned out response, as opposed to rushing out a hack of a patch which introduces further problems.

    1. Paul Crawford Silver badge

      Re: Common sense approach

      Both sides have a valid point:

      - SAH-1 is not used as a sole measure of correctness, so no immediate panic.

      - Sooner or later, someone will find a way to compromise at least some aspects of some GIT-based project if other attributes of generating a hash collision become easy enough that length and position of fix-up crude become easy to manipulate.

      1. 9Rune5

        Re: Common sense approach

        I do not quite follow. What exactly would an attacker manipulate in a git repository?

        For the purpose of this discussion, git calculates the hash based on the author signature in addition to the files' contents (http://stackoverflow.com/a/17278248/1736944). So yes, there are some fields that can be exploited to hide away stuff that won't ruin the source code.

        Are you concerned someone will push a commit having an identical sha-1 hash? Well, if you push a commit having an identical hash as a previous commit, then git will assume you are trying to push something it already has. It will ignore it. (http://stackoverflow.com/a/34888325/1736944)

        Or is the concern here that somebody will break into the repository and change something directly? Well... Who would notice that, based on the hash...? Is the suggestion here that the hash is somehow protecting the git users?

        AFAICT, the weakest link is the pull request. The person(s) maintaining the repository must be trusted to not accept pull requests containing nefarious code. The hash algorithm has no bearing on this process. And naturally: users have to know which repository to fetch.

        I am new to git. We only started using it a year ago. I am obviously missing something in this discussion.

        1. Paul Crawford Silver badge

          Re: Common sense approach

          I do not quite follow. What exactly would an attacker manipulate in a git repository?

          To be honest, I'm not sure. But time after time people find cunning ways of gaming systems that nobody had thought of before that.

          Off the top of my head, the obvious thing is you could manipulate somebody's private GIT repository to change code but still have it appearing to match a public trusted one. Sure, if you have that level of access there are a hell of a lot more nefarious things you might do to them, but that would be one possible way of getting a back-door in to a specific company's system based on a otherwise trusted code base.

          1. Anonymous Coward
            Anonymous Coward

            Re: Common sense approach

            "if somebody inserts random odd generated crud in the middle of your source code, you will absolutely notice"

            There are plenty of binary blobs in the Linux kernel these days. Would you notice if someone tweaked, say, this file in the kernel tree?

            https://github.com/torvalds/linux/blob/master/firmware/bnx2x/bnx2x-e1-6.2.9.0.fw.ihex

            1. Anonymous Coward
              Anonymous Coward

              Re: Common sense approach

              That's a binary blob in the kernel, but the source file in git is in ASCII. It uses 18 distinct ASCII characters, by the looks of it. I don't know whether the published method for creating SHA-1 collisions could be modified to generate hexadecimal characters that could be hidden in a file like that. Does anyone know the answer?

  3. bombastic bob Silver badge
    Happy

    SHA1 still "useful" then?

    I suppose Linus is saying:

    a) Don't panic

    b) SHA1 is still useful for SOME things

    c) Vogon poetry is STILL the 2nd worst in the galaxy (I was compelled to make that reference)

    I'm safe, because I have my towel with me.

    (I suppose on an embedded system or a microcontroller, SHA1 is easier to gonkulate than SHA256 so it would have some use THERE as well, but yeah... other implications obvious)

    1. K.o.R
      Coat

      Re: SHA1 still "useful" then?

      > c) Vogon poetry is STILL the 2nd worst in the galaxy (I was compelled to make that reference)

      The Azgoths of Kira will be deeply disappointed to know they have been bumped off the #2 spot.

      1. hammarbtyp

        Re: SHA1 still "useful" then?

        Just imagine how Paul Neil Milne Johnstone of Redbridge feels...

  4. Voland's right hand Silver badge
    Holmes

    I missed this reading the original collision notification

    I actually missed the part that the attack produced by Google needs to meddle with both sides - "good" and "bad" of the collision. If this is the case, this drastically decreases the attack surface as it is an attack that cannot be applied to an arbitrary SHA1 out there. Sure, an attack that is applicable only to a limited set of SHA1 scenarios today may be modified to hit arbitrary ones in the future. The key word is "future".

    As far as any source code control system is concerned it has multiple layers of verification:

    1. SCCM integrity (git fsck in this case).

    2. Build

    3. Code style check

    4. Integration test check

    If a SCCM uses hashes for addressing, creating a hash collision may get you past 1. You still need to get past 2 - your code should build, past 3 - anything you have inserted must pass style checks and past the test suite. That as a whole is actually a fairly high order. This is also very different from "black boxes" like document formats where you can ship megabytes of non-visible content which nobody will notice. It is also different from digital certificates where you either trust the signature or your do not.

    1. Naselus

      Re: I missed this reading the original collision notification

      "I actually missed the part that the attack produced by Google needs to meddle with both sides - "good" and "bad" of the collision."

      You missed it because it doesn't 'need' to. They did so in the demonstration because it's easier to match the hashes by manipulating both, but the point was simply to demonstrate that two different (but similar) documents could be made to produce an identical hash (as opposed to two masses of random data generating the same result, which a Chinese team proved about ten years ago). But in theory it should be possible to do the same thing just by manipulating the 'bad' document; it'd just take more compute to do so - but doing so is now an affordable operation.

      1. MJB7

        Re: I missed this reading the original collision notification

        @Naselus : ""I actually missed the part that the attack produced by Google needs to meddle with both sides - "good" and "bad" of the collision." You missed it because it doesn't 'need' to. "

        That is seriously wrong. The attack absolutely *does* need to fiddle with both sides. Fiddling with only one side is not a collision attack, it is a pre-image attack - and nobody has demonstrated a pre-image attack against even MD5 yet.

        1. Naselus

          Re: I missed this reading the original collision notification

          Hence the '' round need :)

          Second preimage resistance is implied by collision resistance. If you demonstrate collision resistance is affordable, then second preimage resistance is no longer guaranteed to be unaffordable. So there's no need to actually perform a real preimage once you've shown collision is possible on real data because the hash is already affordably insecure.

  5. Brian Scott

    Good software design

    The key here isn't whether sha-1 should be used in git in the first place.

    Good practice in designing security software should acknowledge that after some time all of these things become obsolete so you need to design in a framework that allows you to easily migrate to future algorithm when the need arises. Baking sha-1 into the design is a mistake if it is then too difficult to change.

    Other than that,there is no particular reason to be worried about sha-1. It's just another warning shot to not use it in new products and to start looking at how to turn it off in existing software. This should be simple with well designed software.

    1. Bronek Kozicki

      Re: Good software design

      ... and it appears that, in git, transition to a different hash is not going to be difficult. At least, that's how I interpret this: "And finally, there's actually a reasonably straightforward transition to some other hash that won't break the world - or even old git repositories."

  6. Deltics

    For crying out loud...

    Google talk in dark, foreboding terms about changing a rental agreement to hoodwink some poor unsuspecting tenant into paying a higher rent than they think they signed up for and "proving" it with the SHA-1

    But in their "proof" of the attack all they did was change the color of a frickkin' graphic and people lapped it up and jumped all over the SHA-1 is broken bandwagon.

    And since SHA-1 has been suspect for years, anyone trying to use it to sucker a mark into paying a higher rent will (or should) find themselves being asked why they are relying on a demonstrably broken digital "signature" ?

    And that's even assuming that it is possible to "attack" a document in this way, rather than just fiddling with the colors.

    1. Richard 12 Silver badge

      Context is King

      The attack is important, however at present it only matters in a very limited context.

      Google have proven that it is feasible for an entity to produce two PDFs with different content and the same SHA-1 hash, by means of embedding some "junk" in the unused data sections of both documents.

      Thus a third party may replace a document they wrote by another document they wrote, and you won't detect it if SHA-1 is the only thing you use to check that it is unchanged.

      In other words, don't use SHA-1 as the only method of checking if a document is unchanged. Use other hashes as well, and for very important things (eg contracts), compare the full binary data - it's the only way to be sure.

      Heck, even MD5 is likely to be good enough. Yes, it's now simple to create a hash collision in MD5, but a collision of MD5 and SHA-1 at the same time?

      The sky isn't falling.

      1. yoganmahew

        Re: Context is King

        @Richard 12

        "Google have proven that it is feasible for an entity to produce two PDFs with different content and the same SHA-1 hash, by means of embedding some "junk" in the unused data sections of both documents."

        I am not a mathematically sound person, coming from that well know lilberal arts route to low level programming. I am, though, bemused that this is news to anyone? Particularly the smartest people in the room? Did sane people really think that the output of a hash, given unlimited input data, was always going to be unique?

        edit: I see DougS made the same/a similar point below/earlier. The hash doesn't matter, it's how you roll it...

        Or have Google just disproved unicorns?

        1. Richard 12 Silver badge

          Re: Context is King

          Everyone has always known that hashes aren't unique. Otherwise they'd be called compressed files instead of hashes (or digests).

          The point of a cryptographic hash is twofold:

          1) It should be extremely unlikely that a randomly corrupted copy of the item would result in the same hash as the perfect copy.

          2) It should be infeasible for somebody to intentionally make two different and usable items that have the same hash.

  7. Anonymous Coward
    Anonymous Coward

    Sounds a bit like he's justifying a past questionable decision...

    SHA-1 was fine for what he was using it for (when the decision was made to use it) but he was warned that sometime in the near future (now) it's not really good enough.

    Sure the sky isn't falling, but everyone should be using SHA256 going forward.

    And, many things using using SHA-1 should transition sooner rather than later. The question is how much time/money is that going to take, and is it OK to ignore it because by the time the Sky (really) is Falling it's going to be obsolete (not in use) anyways.

    1. Lee D Silver badge

      Not really - SHA-1 was, as you say, fine for what he was using it for.

      And if you look at why he's not panicked, it's because he didn't rely on any particular assumptions about SHA-1 lasting forever, for instance.

      Hashes and cryptoalgorithms last 10 years now if you're lucky. Protocols and code last much longer than that (isn't the Linux kernel over 20 years old now? And even IPv6 is that old and not seeing full deployment still.

      In git, SHA-1 is not used for security, it's used as a quick check, and a easily referenced nugget of information that can identify a particular change. As such, it can be replaced by any number of things quite easily. Sure, probably an on-disk format change would be required too but in such an open program, that's hardly a concern.

      But if you were using SHA-1 in your SSL setup, you have had an issue for a while now. That's why it's being phased out. And we all knew that was what was going to happen.

      Sooner or later, WPA2 will be dead, just like WPA and WEP before it.

      Sooner or later, SHA-3 will be dead, just like SHA-2, SHA-1, MD5 and myriad others before it.

      Rather than design your protocol to be RELIANT on it, especially if that reliance directly affects the secrecy of data rather than, say, use as a quick reference checksum in an open repository, design your protocol such that every such advance is handled like this: "Yeah, it's not really a problem. Next version fixes it for another 10 years."

  8. Bill Gray

    Am I missing something here?

    "...There's a big difference between using a cryptographic hash for things like security signing, and using one for generating a 'content identifier' for a content-addressable system like git"

    Is Linus actually saying that the cryptographic aspect of SHA1 is really irrelevant to Git? That if, tomorrow, somebody comes up with a simple means of taking a given message and creating a message with the same SHA1 hash, it'll screw everybody using SHA1 for secure signing, but wouldn't matter as far as Git is concerned?

    I could believe this, having used utterly non-secure hashes in (for example) making hash tables; for that purpose, you want something with an even distribution and few collisions, but security is irrelevant. An example from my own code :

    https://github.com/Bill-Gray/find_orb/blob/master/pl_cache.cpp#L288

    I needed blazing speed and an even distribution without collisions, but not security.

    1. Anonymous Coward
      Anonymous Coward

      Re: Am I missing something here?

      Yes, that's exactly what he's saying. Git is using SHA1 signatures the same way RS232 used parity, to detect corruption. SHA1 is just far far better at doing that than parity. The fact that it is possible to deliberately engineer a SHA1 hash collision is irrelevant to its ability to be used to detect unintentional file corruption.

      If git was trying to use cryptographic techniques to prove a file is authored by Linus versus me, it wouldn't help it was hashed with SHA256 instead of SHA1. It would have to be signed by Linus' private key, and your copy of git would have to have securely received Linus' public key to verify that the repository you downloaded was signed by him, and not by signed by me trying to give you Linux code with a backdoor built in.

      Since the hash is generated on the machine of the person running git, if it was changed to use SHA256, you still couldn't tell the difference between a repository created by Linus and one created by me. Both our copies of git would create correct SHA256 hashes, and your copy of git would validate them both.

      1. Anonymous Coward
        Anonymous Coward

        Re: Am I missing something here?

        > Git is using SHA1 signatures the same way RS232 used parity, to detect corruption

        Ah, those were the days! When a 50% chance of problem detection was good enough. :-)

        At 300 baud, you could almost transcribe the data by ear from the modem chirps anyway (talk about silly bets).

        1. Anonymous Coward
          Anonymous Coward

          Re: Am I missing something here?

          Actually less than a 50% chance, since parity only successfully detects an odd number of bit errors, and gives false negatives for 2, 4, 6 or 8 bit errors.

  9. a_yank_lurker

    Lifetime of Hash Algorithms

    As Lee D noted, today's best practice will be tomorrow's obsolete methodology. SHA-1 has been considered weak for sometime now. However, intentionally creating 2 documents with the hash is not trivial but obviously doable. The real question is how different are the two documents. If they are obviously different with a quick glance then it is not as disastrous as if it is a couple of very small edits (a scenario many have postulated) that are the difference. If the hash is being used for cryptographic security then there are problems. But if it is a quick check the downloaded file this is not a very serious concern.

    1. Richard 12 Silver badge

      Re: Lifetime of Hash Algorithms

      The difference Google demonstrated was the background colour, so basically turning #FFFFFF into #FF0000

      So yes, a small visible difference. Sufficient to turn a "site licence" into a "named user licence".

      1. John Robson Silver badge

        Re: Lifetime of Hash Algorithms

        So yes, a small visible difference. Sufficient to turn a "site licence" into a "named user licence".

        Particularly if those are check boxes on a list...

        1. Anonymous Coward
          Anonymous Coward

          Re: Lifetime of Hash Algorithms

          For contracts this is a non-issue.

          Presumably you *kept* the original copy of the contract, and you can immediately show that the one you have also has the same SHA1 hash. If you present this immediately (as soon as the 'bad' contract is shown by the evil other party) you confirm that you didn't have time to compute your own bad version.

          For certs it is an issue. If someone can buy a cert for randomdomain.com and convert it into an EV cert for bigbank.com - or worse, a sub-CA cert - that's not great.

      2. John Sanders
        Big Brother

        Re: Lifetime of Hash Algorithms

        """The difference Google demonstrated was the background colour, so basically turning #FFFFFF into #FF0000"""

        They achieved this by including a binary chunk on the PDF, you make it sound as if the only change is FFFF to 0000, the document payload was altered significantly.

        If you run a diff of both documents the differences will be massive on the binary chunk.

        1. Richard 12 Silver badge

          Re: Lifetime of Hash Algorithms

          Well yes, of course.

          Almost every file format has "ignored" sections that you could hide the hash-collision data in, and the appropriate viewer would still display - or execute it - it fine.

          Including all known executable formats.

          And yes, a binary comparison will always spot this. But that requires that you do have the original and can do such a comparison.

  10. Will Godfrey Silver badge
    Meh

    The way it seems to me

    As I understand hashes, the smaller the change being made, the harder it is to find a usable collision. So if you want to sneak in something unnoticeable (and that will compile) you're going to have to really work for it.

  11. bolac

    Git repos should be pulled using HTTPS. HTTPS should not be used with SHA1. TLS is where the security is coming from, git itself does not have signatures.

    1. Anonymous Coward
      Anonymous Coward

      > git itself does not have signatures.

      In fact, Git's security model incorporates precisely cryptographic signatures (external to Git itself, and therefore adaptable, I might add). It is up to the individual team however to decide whether and how they integrate this into their workflow (cf. the Linux kernel).

  12. Anonymous Coward
    Anonymous Coward

    Once again - try it with .TXT files

    Someone noted on the original SHA-1 story that the hash collisions demonstrated were in fancy file formats (PDF) which had the space to meddle with data while remaining the same length.

    Until someone can demonstrate two .TXT files of exactly the same length and hash, where the change made leaves the second intelligible *and* syntactically correct, I won't lose too much sleep.

    NOTE: Source code is .TXT - or should be.

    1. Charlie Clark Silver badge

      Re: Once again - try it with .TXT files

      Until someone can demonstrate two .TXT

      No, you don't really want to wait that long once this kind of proof of concept has been produced. Some of the people who might exploit it may have access to resources considerably beyond those used in the study and they won't tell you when they can do it.

      Fortunately, replacement ciphers are available and should be rolled out. No new encryption projects should rely on the older protocol.

    2. Richard 12 Silver badge

      Re: Once again - try it with .TXT files

      Source code has comments.

      /* gdiiw7ehsyw77 */

  13. gnasher729 Silver badge

    What can actually happen? Let's say there is a file in a repo with hash X. And I can manage to create a different file with the same hash X. The first problem is that I cannot put that file into the repository - git will tell me that it's the same hash, so the file "is already there" - it doesn't let me replace a file with an identical file because that is inefficient and pointless, and it doesn't let me replace a file with a different one with the same hash because it thinks it's the same one. You just can't have two different files with the same hash in the same repository.

    The only attack vector is to replace a file on some developer's hard drive with a different one, and git won't notice. So you would need to access that developer's computer and replace a file.

    1. trev.rollins

      Why are you even going over this here? It's literally Linus' first reply to the mailing list post linked in the article, along with several much better informed and compelling points. SHA1 being cryptographically broken is basically a non-issue in the context of git.

  14. Red Bren
    Coat

    Linus' ego

    Was it just me that read "If you fetch a Linux kernel from Linus's ego"

  15. Pascal Monett Silver badge

    "nasty people will teach him the threat model"

    That may indeed happen.

    I totally trust Torvalds to quickly learn from the experience and do the right thing. Actually, I'm certain he is already considering alternatives. For all his outbursts and temperamental postings, Torvalds is unquestionably intelligent and reasoned. You might fool him once, but you won't get a second chance.

  16. Colin Tree

    xmit error

    I use the hash to check the file transmission had no errors, that's all.

    You have to trust the source Luke.

    Sometimes reputable news sites report servers have been compromised,

    which has happened to trustworthy sites from time to time.

    Basically, peer reviewed security.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like