About that SHA-1 collision: Linus Torvalds has taken to Google+ to emphasise that in Git, its main role is error detection, so “the sky isn't falling.” The weak hashing algorithm is used, among other things, to provide a digital signature for software, documents like PDFs, and encryption certificates. The mathematical …

COMMENTS

Post your comment

House rules Send corrections

Add to 'My topics'

Sunday 26th February 2017 21:22 GMT Electron Shepherd

That's not how hashes work

"The mathematical operation should produce a unique result for any given input"

That's not how hashes work. There's lots of inputs that will all produce the same hash, and producing a hash from an input is, relatively speaking, computationally trivial. The tricky bit is to, for a given hash, find an input that will generate that hash and that is meaningful in the context of the original input (for example, as Google did producing a second valid PDF). That is computationally very difficult. That's why they are often called "one-way" functions.

39 0 Reply
1. Monday 27th February 2017 01:44 GMT P. Lee
  
  Re: That's not how hashes work
  
  The issue is whether you can compromise the client before they realise the code is not what they think.
  
  If you can compromise the GIT client and then pull down the proper code in the background so you don't know you've been compromised, you have a problem.
  
  This is one reason its good to avoid monoculture. It would be hard to compromise lots of different clients (while maintaining the hash) and you only need one client to notice the compromise and the user to alert the others.
  
  0 10 Reply
2. Monday 27th February 2017 09:48 GMT gnasher729
  
  Re: That's not how hashes work
  
  Electron, what you post is nonsense. A 160 bit hash _does_ in practice produce a unique result for any given input (unless you spend 6,600 years of CPU time to search for two given inputs with the same result).
  
  Someone did a calculation for backup systems that calculate hash codes to see if an identical file is already stored. It is _possible_ that you have a different file with the same hash as one that is already stored, and your file isn't going to be backed up. It is also possible that 5 seconds after the backup, a small meteorite crashes into your office and destroys your computer, and a bigger one destroys the data centre where your backup is held. The latter scenario has higher probability.
  
  4 16 Reply
  1. Monday 27th February 2017 10:40 GMT Paul Smith
    
    Re: That's not how hashes work
    
    Sorry, but no it doesn't! Any hash key can produce only a limited number of distinct results, and while the greater the number of bits in use, the less chance of *accidentally* encountering a collision. To go from "the chances of a random collision are vanishingly small" to "it always produces a unique result" is the sort of dangerous mistake that hackers love to exploit.
    
    Linus was using SHA-1 as a cheap way of calculating a hash that was *very unlikely* to collide, the hackers are using a known algorithm to produce a predetermined result.
    
    Imaging Linus was simply summing the bits mod 1024. there would a 1 in 1024 chance of a collision. If a hackers target has a hash of 512 and the code they want to use has a hash of 384, then they just have to add 128 to produce a valid fake.
    
    15 0 Reply
  2. Monday 27th February 2017 11:12 GMT Doctor Syntax
    
    Re: That's not how hashes work
    
    "A 160 bit hash _does_ in practice produce a unique result for any given input (unless you spend 6,600 years of CPU time to search for two given inputs with the same result)."
    
    AIUI there is now a method of constructing a colliding pair of files with rather less than the 6,600 years you suggest. That's what's set off this whole discussion.
    
    I think one of Linus' points is that constructing a file which gives the same hash as an existing file and having it compilable is problem of a very different order of magnitude.
    
    16 0 Reply
    1. Monday 27th February 2017 11:41 GMT TeeCee
      
      Re: That's not how hashes work
      
      That's 6,600 years of CPU time.
      
      If you have 6,600 CPUs, that's a year. If you're Google and have the equivalent of a medium sized country full of CPUs....
      
      10 0 Reply
    2. Monday 27th February 2017 14:58 GMT Tom Melly
      
      Re: That's not how hashes work
      
      Well, this is what I wonder about. Say you modify a genuine function in the source to do something nasty - surely all you have to do is bury a nice big comment somewhere else in the source with the appropriate values inside it to create a matching hash? That's for source rather than the compiled exe of course.
      
      I dunno - it sounds like Linus owes Gilmore an apology, irrespective of how exploitable the flaw is. I get what Linus is saying, but it reads like self-justification rather than a best practice.
      
      1 1 Reply
      1. Wednesday 1st March 2017 03:17 GMT grim0013
        
        Re: That's not how hashes work
        
        The approach you describe (hiding random data in a comment) would be much more difficult to hide from Git than 'merely' generating a collision of the sort being discussed. Git doesn't just hash the content--it prepends a header that includes the size of the object in question. This means that the 'doctored' content must either be the exact same size as the original, or the collision must also take the header into account. Accomplishing either of these things is significantly more difficult than simply generating a collision. Not impossible, but not--for the moment--a credible threat. Really it just means that they should move to a larger hash some time within the next few years or so.
        
        Oh yeah.. It also just occurred to me that this type of attack also assumes that nobody is going to bother looking at the commit diff for some reason. "Some random person just made a +1329 / -7 commit to the kernel, you say? About time those bums did something substantial. I'm sure it's fine."
        
        On a side note, I've only recently started digging in to how Git works under the covers, and I hadn't quite worked out what the reason was for including size information when hashing 'blob' objects. Well, now I know.
        
        4 0 Reply
  3. Monday 27th February 2017 14:01 GMT badger31
    
    Re: That's not how hashes work
    
    "The mathematical operation should produce a unique result for any given input"
    
    Given that "any given input" is conceptually an infinite set, always expecting a unique 160bit hash is just ludicrous. The main thing with these hashes is that small changes in the code will give hugely differing hashes, making collisions rare. We all know that it is possible to engineer (by brute force) a collision, but it's a hard problem. Doing so in such a way as to be undetectable in Git is an even harder problem. Way harder. I look forward to seeing the attempts, though :-)
    
    7 1 Reply
  4. Monday 27th February 2017 14:18 GMT juice
    
    Re: That's not how hashes work
    
    "Electron, what you post is nonsense. A 160 bit hash _does_ in practice produce a unique result for any given input (unless you spend 6,600 years of CPU time to search for two given inputs with the same result)."
    
    160 bits make for a big number. A really big number, with lots and lots of zeros after it. And any halfway decent hash algorithm worth it's salt [*] will be designed such that even a tiny change in the input will produce a significantly different output.
    
    However, 160 bits is not infinity. It's not even a googleplex. And if someone's deliberately trying to force collisions, you'd be foolish indeed to assume that things are safe.
    
    Then too, assuming they're using the "1 GFLOP machine working for a year" definition of CPU-years [**], that figure of 6,600 years isn't as impressive as it sounds.
    
    An Intel i7 can run at over 350 GFLOPs in dual-precision mode, while a modern GPU (e.g. Radeon RX 480) can theoretically churn out up to 5 TFLOPs in single-precision mode; the Tesla K80 used by Amazon for their cloud-computing back-end can churn out 8.74 TFLOPS in single-precision mode.
    
    Then too, there's always the possibility of using distributed computing or specialised hardware - this type of problem is inherently parallelisable and bitcoin mining has shown how effective ASICs can be for this type of number crunching. Also, since the people most likely to want to force a collision are nation-states or hackers, they could well have access to a supercomputer or maybe even botnets - I wouldn't be surprised to see people offering collision-detection as a service, as more traditional profit streams continue to dry up (albeit with botnets increasingly being based on IoT low-power devices, there may not be many of these).
    
    So, yeah. 160 bits is good. But these days, it arguably ain't good enough.
    
    [*] sorrynotsorry
    
    [**] http://www.gridrepublic.org/joomla/index.php?option=com_mambowiki&Itemid=35&compars=help_documentation&page=GFlops,_G-hours,_and_CPU_hours
    
    2 0 Reply
  5. Monday 27th February 2017 18:11 GMT Anonymous Coward
    
    Re: That's not how hashes work
    
    "A 160 bit hash _does_ in practice produce a unique result for any given input "
    
    Wtf does this ridiculous sentence even mean?
    
    A 160 bit hash is 20 bytes, isn't it?
    
    So if one assumes that the source data being hashed is bigger than 20 bytes, it doesn't matter what hash algorithm is in use, there must by definition be more than one set of input data which will generate particular hash values. Otherwise e.g. a disk would store the hash value not the input data. Same logic applies to any sensible hash size.
    
    "Someone did a calculation for backup systems that calculate hash codes to see if an identical file is already stored. It is _possible_ that you have a different file with the same hash as one that is already stored, and your file isn't going to be backed up"
    
    Fancy that. What are the odds that the "someone" in this picture was a disk deduplication vendor? And/or that the logic in question actually was "if the hash of the new data and the already-stored data are different then we know that the old and new data ARE different, and therefore we must store the new data. IF (and only if) the hashes match, we must THEN compare the actual old data and new data, rather than the old and new hashes, and do some other extra work to ensure that the new data CAN be stored and recovered safely."
    
    3 0 Reply
    1. Monday 27th February 2017 20:06 GMT Anonymous C0ward
      
      Re: That's not how hashes work
      
      All hashes, by definition, are vulnerable to collisions and infinitely many of them. https://en.m.wikipedia.org/wiki/Pigeonhole_principle
      
      4 0 Reply
3. Monday 27th February 2017 16:44 GMT Anonymous Coward
  
  Re: That's not how hashes work
  
  "The tricky bit is to, for a given hash, find an input that will generate that hash and that is meaningful in the context of the original input (for example, as Google did producing a second valid PDF). That is computationally very difficult."
  
  For difficult read impossible if you don't know the size of the original hashed data. It could be smaller than the hash or it could be gigabytes in size and you'd have to try every permutation of every size which would take way longer than the age of the universe even with a server farm at your disposal. Obviously if the hashing algorithm has a flaw which means it doesn't produce all possible hashes or produces identical hashes more often than it should and/or you have a rough idea of the oriiginal data size this narrows it down but even so, its far from easy.
  
  0 0 Reply
Sunday 26th February 2017 21:32 GMT 2460 Something

Common sense approach

Nice 'Calm down dear' from Linus. I enjoy a good 'risk of security compromise' as much as the next sys admin (read: not at all) but these need to be taken in context with the actual risk of implementation and the scope of what can be achieved. As he states, git is not using this to authoritatively state that the data is from said user, merely that it didn't get corrupted in transfer. You already know which source you are connecting to as this is handled by the https certificate.

Very refreshing to have a proper, reasoned out response, as opposed to rushing out a hack of a patch which introduces further problems.

31 0 Reply
1. Sunday 26th February 2017 21:40 GMT Paul Crawford
  
  Re: Common sense approach
  
  Both sides have a valid point:
  
  - SAH-1 is not used as a sole measure of correctness, so no immediate panic.
  
  - Sooner or later, someone will find a way to compromise at least some aspects of some GIT-based project if other attributes of generating a hash collision become easy enough that length and position of fix-up crude become easy to manipulate.
  
  1 1 Reply
  1. Monday 27th February 2017 08:18 GMT 9Rune5
    
    Re: Common sense approach
    
    I do not quite follow. What exactly would an attacker manipulate in a git repository?
    
    For the purpose of this discussion, git calculates the hash based on the author signature in addition to the files' contents (http://stackoverflow.com/a/17278248/1736944). So yes, there are some fields that can be exploited to hide away stuff that won't ruin the source code.
    
    Are you concerned someone will push a commit having an identical sha-1 hash? Well, if you push a commit having an identical hash as a previous commit, then git will assume you are trying to push something it already has. It will ignore it. (http://stackoverflow.com/a/34888325/1736944)
    
    Or is the concern here that somebody will break into the repository and change something directly? Well... Who would notice that, based on the hash...? Is the suggestion here that the hash is somehow protecting the git users?
    
    AFAICT, the weakest link is the pull request. The person(s) maintaining the repository must be trusted to not accept pull requests containing nefarious code. The hash algorithm has no bearing on this process. And naturally: users have to know which repository to fetch.
    
    I am new to git. We only started using it a year ago. I am obviously missing something in this discussion.
    
    7 0 Reply
    1. Monday 27th February 2017 08:56 GMT Paul Crawford
      
      Re: Common sense approach
      
      I do not quite follow. What exactly would an attacker manipulate in a git repository?
      
      To be honest, I'm not sure. But time after time people find cunning ways of gaming systems that nobody had thought of before that.
      
      Off the top of my head, the obvious thing is you could manipulate somebody's private GIT repository to change code but still have it appearing to match a public trusted one. Sure, if you have that level of access there are a hell of a lot more nefarious things you might do to them, but that would be one possible way of getting a back-door in to a specific company's system based on a otherwise trusted code base.
      
      2 0 Reply
      1. Monday 27th February 2017 09:46 GMT Anonymous Coward
        
        Re: Common sense approach
        
        "if somebody inserts random odd generated crud in the middle of your source code, you will absolutely notice"
        
        There are plenty of binary blobs in the Linux kernel these days. Would you notice if someone tweaked, say, this file in the kernel tree?
        
        https://github.com/torvalds/linux/blob/master/firmware/bnx2x/bnx2x-e1-6.2.9.0.fw.ihex
        
        2 2 Reply
        
        Monday 27th February 2017 11:37 GMT Anonymous Coward
        
        Re: Common sense approach
        
        That's a binary blob in the kernel, but the source file in git is in ASCII. It uses 18 distinct ASCII characters, by the looks of it. I don't know whether the published method for creating SHA-1 collisions could be modified to generate hexadecimal characters that could be hidden in a file like that. Does anyone know the answer?
        
        4 0 Reply
Sunday 26th February 2017 21:44 GMT bombastic bob

SHA1 still "useful" then?

I suppose Linus is saying:

a) Don't panic

b) SHA1 is still useful for SOME things

c) Vogon poetry is STILL the 2nd worst in the galaxy (I was compelled to make that reference)

I'm safe, because I have my towel with me.

(I suppose on an embedded system or a microcontroller, SHA1 is easier to gonkulate than SHA256 so it would have some use THERE as well, but yeah... other implications obvious)

4 4 Reply
1. Monday 27th February 2017 07:03 GMT K.o.R
  
  Re: SHA1 still "useful" then?
  
  > c) Vogon poetry is STILL the 2nd worst in the galaxy (I was compelled to make that reference)
  
  The Azgoths of Kira will be deeply disappointed to know they have been bumped off the #2 spot.
  
  8 0 Reply
  1. Monday 27th February 2017 09:56 GMT hammarbtyp
    
    Re: SHA1 still "useful" then?
    
    Just imagine how Paul Neil Milne Johnstone of Redbridge feels...
    
    2 0 Reply
Sunday 26th February 2017 21:44 GMT Voland's right hand

I missed this reading the original collision notification

I actually missed the part that the attack produced by Google needs to meddle with both sides - "good" and "bad" of the collision. If this is the case, this drastically decreases the attack surface as it is an attack that cannot be applied to an arbitrary SHA1 out there. Sure, an attack that is applicable only to a limited set of SHA1 scenarios today may be modified to hit arbitrary ones in the future. The key word is "future".

As far as any source code control system is concerned it has multiple layers of verification:

1. SCCM integrity (git fsck in this case).

2. Build

3. Code style check

4. Integration test check

If a SCCM uses hashes for addressing, creating a hash collision may get you past 1. You still need to get past 2 - your code should build, past 3 - anything you have inserted must pass style checks and past the test suite. That as a whole is actually a fairly high order. This is also very different from "black boxes" like document formats where you can ship megabytes of non-visible content which nobody will notice. It is also different from digital certificates where you either trust the signature or your do not.

7 1 Reply
1. Monday 27th February 2017 10:39 GMT Naselus
  
  Re: I missed this reading the original collision notification
  
  "I actually missed the part that the attack produced by Google needs to meddle with both sides - "good" and "bad" of the collision."
  
  You missed it because it doesn't 'need' to. They did so in the demonstration because it's easier to match the hashes by manipulating both, but the point was simply to demonstrate that two different (but similar) documents could be made to produce an identical hash (as opposed to two masses of random data generating the same result, which a Chinese team proved about ten years ago). But in theory it should be possible to do the same thing just by manipulating the 'bad' document; it'd just take more compute to do so - but doing so is now an affordable operation.
  
  2 0 Reply
  1. Monday 27th February 2017 14:52 GMT MJB7
    
    Re: I missed this reading the original collision notification
    
    @Naselus : ""I actually missed the part that the attack produced by Google needs to meddle with both sides - "good" and "bad" of the collision." You missed it because it doesn't 'need' to. "
    
    That is seriously wrong. The attack absolutely *does* need to fiddle with both sides. Fiddling with only one side is not a collision attack, it is a pre-image attack - and nobody has demonstrated a pre-image attack against even MD5 yet.
    
    2 0 Reply
    1. Thursday 9th March 2017 17:50 GMT Naselus
      
      Re: I missed this reading the original collision notification
      
      Hence the '' round need :)
      
      Second preimage resistance is implied by collision resistance. If you demonstrate collision resistance is affordable, then second preimage resistance is no longer guaranteed to be unaffordable. So there's no need to actually perform a real preimage once you've shown collision is possible on real data because the hash is already affordably insecure.
      
      0 0 Reply
Sunday 26th February 2017 23:38 GMT Brian Scott

Good software design

The key here isn't whether sha-1 should be used in git in the first place.

Good practice in designing security software should acknowledge that after some time all of these things become obsolete so you need to design in a framework that allows you to easily migrate to future algorithm when the need arises. Baking sha-1 into the design is a mistake if it is then too difficult to change.

Other than that,there is no particular reason to be worried about sha-1. It's just another warning shot to not use it in new products and to start looking at how to turn it off in existing software. This should be simple with well designed software.

9 0 Reply
1. Tuesday 28th February 2017 09:31 GMT Bronek Kozicki
  
  Re: Good software design
  
  ... and it appears that, in git, transition to a different hash is not going to be difficult. At least, that's how I interpret this: "And finally, there's actually a reasonably straightforward transition to some other hash that won't break the world - or even old git repositories."
  
  0 0 Reply
Monday 27th February 2017 00:02 GMT Deltics

For crying out loud...

Google talk in dark, foreboding terms about changing a rental agreement to hoodwink some poor unsuspecting tenant into paying a higher rent than they think they signed up for and "proving" it with the SHA-1

But in their "proof" of the attack all they did was change the color of a frickkin' graphic and people lapped it up and jumped all over the SHA-1 is broken bandwagon.

And since SHA-1 has been suspect for years, anyone trying to use it to sucker a mark into paying a higher rent will (or should) find themselves being asked why they are relying on a demonstrably broken digital "signature" ?

And that's even assuming that it is possible to "attack" a document in this way, rather than just fiddling with the colors.

0 9 Reply
1. Monday 27th February 2017 07:28 GMT Richard 12
  
  Context is King
  
  The attack is important, however at present it only matters in a very limited context.
  
  Google have proven that it is feasible for an entity to produce two PDFs with different content and the same SHA-1 hash, by means of embedding some "junk" in the unused data sections of both documents.
  
  Thus a third party may replace a document they wrote by another document they wrote, and you won't detect it if SHA-1 is the only thing you use to check that it is unchanged.
  
  In other words, don't use SHA-1 as the only method of checking if a document is unchanged. Use other hashes as well, and for very important things (eg contracts), compare the full binary data - it's the only way to be sure.
  
  Heck, even MD5 is likely to be good enough. Yes, it's now simple to create a hash collision in MD5, but a collision of MD5 and SHA-1 at the same time?
  
  The sky isn't falling.
  
  9 0 Reply
  1. Monday 27th February 2017 09:51 GMT yoganmahew
    
    Re: Context is King
    
    @Richard 12
    
    "Google have proven that it is feasible for an entity to produce two PDFs with different content and the same SHA-1 hash, by means of embedding some "junk" in the unused data sections of both documents."
    
    I am not a mathematically sound person, coming from that well know lilberal arts route to low level programming. I am, though, bemused that this is news to anyone? Particularly the smartest people in the room? Did sane people really think that the output of a hash, given unlimited input data, was always going to be unique?
    
    edit: I see DougS made the same/a similar point below/earlier. The hash doesn't matter, it's how you roll it...
    
    Or have Google just disproved unicorns?
    
    1 3 Reply
    1. Monday 27th February 2017 21:38 GMT Richard 12
      
      Re: Context is King
      
      Everyone has always known that hashes aren't unique. Otherwise they'd be called compressed files instead of hashes (or digests).
      
      The point of a cryptographic hash is twofold:
      
      1) It should be extremely unlikely that a randomly corrupted copy of the item would result in the same hash as the perfect copy.
      
      2) It should be infeasible for somebody to intentionally make two different and usable items that have the same hash.
      
      0 0 Reply
Monday 27th February 2017 00:23 GMT Anonymous Coward

Sounds a bit like he's justifying a past questionable decision...

SHA-1 was fine for what he was using it for (when the decision was made to use it) but he was warned that sometime in the near future (now) it's not really good enough.

Sure the sky isn't falling, but everyone should be using SHA256 going forward.

And, many things using using SHA-1 should transition sooner rather than later. The question is how much time/money is that going to take, and is it OK to ignore it because by the time the Sky (really) is Falling it's going to be obsolete (not in use) anyways.

2 9 Reply
1. Monday 27th February 2017 01:20 GMT Lee D
  
  Not really - SHA-1 was, as you say, fine for what he was using it for.
  
  And if you look at why he's not panicked, it's because he didn't rely on any particular assumptions about SHA-1 lasting forever, for instance.
  
  Hashes and cryptoalgorithms last 10 years now if you're lucky. Protocols and code last much longer than that (isn't the Linux kernel over 20 years old now? And even IPv6 is that old and not seeing full deployment still.
  
  In git, SHA-1 is not used for security, it's used as a quick check, and a easily referenced nugget of information that can identify a particular change. As such, it can be replaced by any number of things quite easily. Sure, probably an on-disk format change would be required too but in such an open program, that's hardly a concern.
  
  But if you were using SHA-1 in your SSL setup, you have had an issue for a while now. That's why it's being phased out. And we all knew that was what was going to happen.
  
  Sooner or later, WPA2 will be dead, just like WPA and WEP before it.
  
  Sooner or later, SHA-3 will be dead, just like SHA-2, SHA-1, MD5 and myriad others before it.
  
  Rather than design your protocol to be RELIANT on it, especially if that reliance directly affects the secrecy of data rather than, say, use as a quick reference checksum in an open repository, design your protocol such that every such advance is handled like this: "Yeah, it's not really a problem. Next version fixes it for another 10 years."
  
  14 0 Reply
Monday 27th February 2017 02:54 GMT Bill Gray

Am I missing something here?

"...There's a big difference between using a cryptographic hash for things like security signing, and using one for generating a 'content identifier' for a content-addressable system like git"

Is Linus actually saying that the cryptographic aspect of SHA1 is really irrelevant to Git? That if, tomorrow, somebody comes up with a simple means of taking a given message and creating a message with the same SHA1 hash, it'll screw everybody using SHA1 for secure signing, but wouldn't matter as far as Git is concerned?

I could believe this, having used utterly non-secure hashes in (for example) making hash tables; for that purpose, you want something with an even distribution and few collisions, but security is irrelevant. An example from my own code :

https://github.com/Bill-Gray/find_orb/blob/master/pl_cache.cpp#L288

I needed blazing speed and an even distribution without collisions, but not security.

5 0 Reply
1. Monday 27th February 2017 07:09 GMT Anonymous Coward
  
  Re: Am I missing something here?
  
  Yes, that's exactly what he's saying. Git is using SHA1 signatures the same way RS232 used parity, to detect corruption. SHA1 is just far far better at doing that than parity. The fact that it is possible to deliberately engineer a SHA1 hash collision is irrelevant to its ability to be used to detect unintentional file corruption.
  
  If git was trying to use cryptographic techniques to prove a file is authored by Linus versus me, it wouldn't help it was hashed with SHA256 instead of SHA1. It would have to be signed by Linus' private key, and your copy of git would have to have securely received Linus' public key to verify that the repository you downloaded was signed by him, and not by signed by me trying to give you Linux code with a backdoor built in.
  
  Since the hash is generated on the machine of the person running git, if it was changed to use SHA256, you still couldn't tell the difference between a repository created by Linus and one created by me. Both our copies of git would create correct SHA256 hashes, and your copy of git would validate them both.
  
  11 0 Reply
  1. Monday 27th February 2017 09:44 GMT Anonymous Coward
    
    Re: Am I missing something here?
    
    > Git is using SHA1 signatures the same way RS232 used parity, to detect corruption
    
    Ah, those were the days! When a 50% chance of problem detection was good enough. :-)
    
    At 300 baud, you could almost transcribe the data by ear from the modem chirps anyway (talk about silly bets).
    
    6 0 Reply
    1. Monday 27th February 2017 21:40 GMT Anonymous Coward
      
      Re: Am I missing something here?
      
      Actually less than a 50% chance, since parity only successfully detects an odd number of bit errors, and gives false negatives for 2, 4, 6 or 8 bit errors.
      
      1 0 Reply
Monday 27th February 2017 03:24 GMT a_yank_lurker

Lifetime of Hash Algorithms

As Lee D noted, today's best practice will be tomorrow's obsolete methodology. SHA-1 has been considered weak for sometime now. However, intentionally creating 2 documents with the hash is not trivial but obviously doable. The real question is how different are the two documents. If they are obviously different with a quick glance then it is not as disastrous as if it is a couple of very small edits (a scenario many have postulated) that are the difference. If the hash is being used for cryptographic security then there are problems. But if it is a quick check the downloaded file this is not a very serious concern.

0 0 Reply
1. Monday 27th February 2017 07:32 GMT Richard 12
  
  Re: Lifetime of Hash Algorithms
  
  The difference Google demonstrated was the background colour, so basically turning #FFFFFF into #FF0000
  
  So yes, a small visible difference. Sufficient to turn a "site licence" into a "named user licence".
  
  0 0 Reply
  1. Monday 27th February 2017 07:40 GMT John Robson
    
    Re: Lifetime of Hash Algorithms
    
    So yes, a small visible difference. Sufficient to turn a "site licence" into a "named user licence".
    
    Particularly if those are check boxes on a list...
    
    0 0 Reply
    1. Monday 27th February 2017 09:52 GMT Anonymous Coward
      
      Re: Lifetime of Hash Algorithms
      
      For contracts this is a non-issue.
      
      Presumably you *kept* the original copy of the contract, and you can immediately show that the one you have also has the same SHA1 hash. If you present this immediately (as soon as the 'bad' contract is shown by the evil other party) you confirm that you didn't have time to compute your own bad version.
      
      For certs it is an issue. If someone can buy a cert for randomdomain.com and convert it into an EV cert for bigbank.com - or worse, a sub-CA cert - that's not great.
      
      0 0 Reply
  2. Monday 27th February 2017 13:43 GMT John Sanders
    
    Re: Lifetime of Hash Algorithms
    
    """The difference Google demonstrated was the background colour, so basically turning #FFFFFF into #FF0000"""
    
    They achieved this by including a binary chunk on the PDF, you make it sound as if the only change is FFFF to 0000, the document payload was altered significantly.
    
    If you run a diff of both documents the differences will be massive on the binary chunk.
    
    2 0 Reply
    1. Monday 27th February 2017 21:44 GMT Richard 12
      
      Re: Lifetime of Hash Algorithms
      
      Well yes, of course.
      
      Almost every file format has "ignored" sections that you could hide the hash-collision data in, and the appropriate viewer would still display - or execute it - it fine.
      
      Including all known executable formats.
      
      And yes, a binary comparison will always spot this. But that requires that you do have the original and can do such a comparison.
      
      0 0 Reply
Monday 27th February 2017 07:39 GMT Will Godfrey

The way it seems to me

As I understand hashes, the smaller the change being made, the harder it is to find a usable collision. So if you want to sneak in something unnoticeable (and that will compile) you're going to have to really work for it.

3 0 Reply
Monday 27th February 2017 08:34 GMT bolac

Git repos should be pulled using HTTPS. HTTPS should not be used with SHA1. TLS is where the security is coming from, git itself does not have signatures.

0 3 Reply
1. Monday 27th February 2017 10:00 GMT Anonymous Coward
  
  > git itself does not have signatures.
  
  In fact, Git's security model incorporates precisely cryptographic signatures (external to Git itself, and therefore adaptable, I might add). It is up to the individual team however to decide whether and how they integrate this into their workflow (cf. the Linux kernel).
  
  4 0 Reply
Monday 27th February 2017 08:59 GMT Anonymous Coward

Once again - try it with .TXT files

Someone noted on the original SHA-1 story that the hash collisions demonstrated were in fancy file formats (PDF) which had the space to meddle with data while remaining the same length.

Until someone can demonstrate two .TXT files of exactly the same length and hash, where the change made leaves the second intelligible *and* syntactically correct, I won't lose too much sleep.

NOTE: Source code is .TXT - or should be.

4 1 Reply
1. Monday 27th February 2017 09:15 GMT Charlie Clark
  
  Re: Once again - try it with .TXT files
  
  Until someone can demonstrate two .TXT
  
  No, you don't really want to wait that long once this kind of proof of concept has been produced. Some of the people who might exploit it may have access to resources considerably beyond those used in the study and they won't tell you when they can do it.
  
  Fortunately, replacement ciphers are available and should be rolled out. No new encryption projects should rely on the older protocol.
  
  1 7 Reply
2. Monday 27th February 2017 21:46 GMT Richard 12
  
  Re: Once again - try it with .TXT files
  
  Source code has comments.
  
  /* gdiiw7ehsyw77 */
  
  1 0 Reply
Monday 27th February 2017 09:56 GMT gnasher729

What can actually happen? Let's say there is a file in a repo with hash X. And I can manage to create a different file with the same hash X. The first problem is that I cannot put that file into the repository - git will tell me that it's the same hash, so the file "is already there" - it doesn't let me replace a file with an identical file because that is inefficient and pointless, and it doesn't let me replace a file with a different one with the same hash because it thinks it's the same one. You just can't have two different files with the same hash in the same repository.

The only attack vector is to replace a file on some developer's hard drive with a different one, and git won't notice. So you would need to access that developer's computer and replace a file.

3 0 Reply
1. Saturday 30th December 2017 17:01 GMT trev.rollins
  
  Why are you even going over this here? It's literally Linus' first reply to the mailing list post linked in the article, along with several much better informed and compelling points. SHA1 being cryptographically broken is basically a non-issue in the context of git.
  
  0 0 Reply
Monday 27th February 2017 13:33 GMT Red Bren

Linus' ego

Was it just me that read "If you fetch a Linux kernel from Linus's ego"

2 1 Reply
Monday 27th February 2017 17:42 GMT Pascal Monett

"nasty people will teach him the threat model"

That may indeed happen.

I totally trust Torvalds to quickly learn from the experience and do the right thing. Actually, I'm certain he is already considering alternatives. For all his outbursts and temperamental postings, Torvalds is unquestionably intelligent and reasoned. You might fool him once, but you won't get a second chance.

0 0 Reply
Sunday 15th October 2017 22:37 GMT Colin Tree

xmit error

I use the hash to check the file transmission had no errors, that's all.

You have to trust the source Luke.

Sometimes reputable news sites report servers have been compromised,

which has happened to trustworthy sites from time to time.

Basically, peer reviewed security.

0 0 Reply

POST COMMENT House rules

Not a member of The Register? Create a new account here.

Topics

Special Features

Vendor Voice

Resources

COMMENTS

That's not how hashes work

Re: That's not how hashes work

Re: That's not how hashes work

Re: That's not how hashes work

Re: That's not how hashes work

Re: That's not how hashes work

Re: That's not how hashes work

Re: That's not how hashes work

Re: That's not how hashes work

Re: That's not how hashes work

Re: That's not how hashes work

Re: That's not how hashes work

Re: That's not how hashes work

Common sense approach

Re: Common sense approach

Re: Common sense approach

Re: Common sense approach

Re: Common sense approach

Re: Common sense approach

SHA1 still "useful" then?

Re: SHA1 still "useful" then?

Re: SHA1 still "useful" then?

I missed this reading the original collision notification

Re: I missed this reading the original collision notification

Re: I missed this reading the original collision notification

Re: I missed this reading the original collision notification

Good software design

Re: Good software design

For crying out loud...

Context is King

Re: Context is King

Re: Context is King

Am I missing something here?

Re: Am I missing something here?

Re: Am I missing something here?

Re: Am I missing something here?

Lifetime of Hash Algorithms

Re: Lifetime of Hash Algorithms

Re: Lifetime of Hash Algorithms

Re: Lifetime of Hash Algorithms

Re: Lifetime of Hash Algorithms

Re: Lifetime of Hash Algorithms

The way it seems to me

Once again - try it with .TXT files

Re: Once again - try it with .TXT files

Re: Once again - try it with .TXT files

Linus' ego

"nasty people will teach him the threat model"

xmit error

POST COMMENT House rules

Enter your comment

Add an icon

Other stories you might like

Torvalds intentionally complicates his use of indentation in Linux Kconfig

Sleuths who cracked Zodiac Killer's cipher thank the crowd

Telegram eyes IPO as user numbers close in on 1 billion

Linux 6.9 will be the first to top ten million Git objects

Nevada sues to deny kids access to Meta's Messenger encryption

Linus Torvalds declares Linux 6.8 is probably back on track for a regular release cycle

European Court of Human Rights declares backdoored encryption is illegal

Feds post $15 million bounty for info on ALPHV/Blackcat ransomware crew

Raspberry Pi Pico cracks BitLocker in under a minute

Linus Torvalds flames Google kernel contributor over filesystem suggestion

Linus Torvalds postpones Linux 6.8 merge window after being taken offline by storms

Meta starts rolling out end-to-end encryption in Facebook Messenger

About Us

Our Websites

Your Privacy