back to article Child abuse image hash list shared with major web firms

The Internet Watch Foundation, Blighty's voluntary body for policing and filtering the 'net for child abuse images, has announced nearly 19,000 hashes of "category A" abuse images have already been stored in its new Hash List and distributed to major web firms. The abuse images are sorted into categories A, B, and C, with "A" …

  1. John G Imrie

    Hash list

    So you try and up load your image and it gets blocked, change the colour of one of the pixels slightly and bingo, a new hash.

    I hope there is more to this system than a hash list.

    1. Jason Pugh

      Re: Hash list

      Errr, yes quite a bit more to it: You might want to read up on PhotoDNA.

    2. Anonymous Coward
      Anonymous Coward

      Re: Hash list

      I had the same thought. Hopefully it's more complex than that although I don't know how.

      1. Aqua Marina

        Re: Hash list

        I suggest you look at something like Tin Eye for how accurate image detection can be. As an example I photoshopped 2 generally available meme images I downloaded from the internet into one joke pic to stick on the wall at work. Tin Eye correctly identified which 2 photographs I cut parts out of to create my pic, and pointed me to where I could download them off the net. It really is quite good. From my chats with a couple of engineers who tinker with image recognition, the only thing holding them back in a facial recognition sense, is privacy legislation. They have had the means for some time, but are discouraged from applying it on Joe Public.

        1. Anonymous Coward
          Anonymous Coward

          Re: Hash list

          That is very interesting.

        2. Mike Taylor

          Re: Hash list

          I wonder whether the people who do this research (and I am seriously glad I am not one of them) has looked at using the 'bag of vectors' approach - I saw some work a few years ago, very impressive at identifying the same types of features - matching objects taken at different angles. Computationally expensive, of course, but that's always something that can be dealt with. I have to say that I'm not surprised to hear of the hashes (from a technical perspective), but it's in contrast with what I heard on the radio the other day about how long it takes the police in the UK to search through devices. I would be interested in looking at the accuracy of taking a cluster of approaches (whole image hashes, partial hashes, feature hashes, even key word analysis).

  2. John Robson Silver badge

    MD5 Bad....

    PhotoDNA - I hope it's somewhat better...

  3. Philip Storry

    Oh, goody! MD5!

    It's lucky they chose an up-to-date hash algorithm that's got no known weaknesses.

    What's that, Carnagie Mellon University's Software Engineering Institute? As of 2010 you consider it "cryptographically broken and unsuitable for further use"? Oh, that's unfortunate... MD5 has been known to have collision issues since 2004? My - that is poor.

    Seriously, MD5 is fine for some things. But for important things - like anything approaching censorship or criminal justice, perhaps - I don't think we should be using MD5. SHA-2 perhaps?

    1. Raumkraut

      cryptographically broken and unsuitable for further use

      "Cryptographically" being the operative word. In this case, it's not being used cryptographically.

      for important things - like anything approaching censorship or criminal justice, perhaps - I don't think we should be using MD5

      In their defence, it's entirely possible that they started using MD5 for this purpose before MD5 was so widely considered useless. And since it's a criminal offence to have possession of the images in question (exceptions notwithstanding), they may no longer have the source images from which to generate new hashes. However, they certainly shouldn't be using it for new images, and given the inclusion of PhotoDNA hashes in the programme, it's entirely possible they no longer do so.

      That said, I would certainly hope they do a more detailed check than just comparing MD5 hashes, before breaking your door down in the middle of the night.

      I'm a dreamer, I know.

      1. streaky

        MD5 is perfectly fine for this, the odds of accidentally stumbling over a collision in real world data are insanely small. They're not going to prosecute you for it, it's obviously used for flagging up images.

        That being said you could find your image deleted from facebook or something - but it's not as if you won't be able to say "well this is wrong".

        When I say md5 is fine I mean on a technical level not using file hashes is generally a stupid idea for this purpose for obvious reasons, but sha-512 isn't exactly going to improve the process - there's less theoretical collisions but..

        Remember you're talking about 128 bit key space so..

        1. Anonymous Coward
          Anonymous Coward

          I would be less worried about accidental collisions and more worried about the possibility of someone intentionally crafting an innocent-looking file that has the same MD5 hash as something on the watch list. (The weaknesses in MD5 allow you to do to this.) Said file could be used for any number of mischievous and/or nefarious purposes, details of which are left as an exercise for the reader.

  4. Anonymous Coward
    Anonymous Coward

    MD5?

    Are they being serious? This is another example of behind the times thinking.

    1. Anonymous Coward
      Anonymous Coward

      Re: MD5?

      The IWF are not encrypting things, they are searching for things that MAY have been encrypted with MD5.

      Why are people posting like "on my god, they should use XXX encryption" like they are talking about securing bank accounts? they want to find images that are ALREADY out there, not securely distribute new ones

      1. Robert Grant

        Re: MD5?

        Don't talk about encryption here; it's not to find encrypted images, nor is it to encrypt them.

  5. Paul 76

    I may be being stupid here but .....

    Suppose I actually wanted to distribute this stuff (I don't !)

    All you'd have to do would be to write a script that changed one pixel on the image (depends on compression), resized it, changed an irrelevant byte in the internal format or just about anything. You could probably convert a PNG to a JPG and back again and you wouldn't get the same file because of lossy compression. Or I could zip the file. Or I could add an extra 256 bytes to the end of the file filled with random values, which another program could strip off.

    Am I wrong here ?

    1. Anonymous Coward
      Anonymous Coward

      Re: I may be being stupid here but .....

      https://en.wikipedia.org/wiki/PhotoDNA

      PhotoDNA ... works by computing a hash that represents an image. This hash is computed such that it is resistant to alterations in the image, including resizing and minor color alterations. It works by converting the image to black and white, re-sizing it, breaking it into a grid, and looking at intensity gradients or edges.

      1. Anonymous Coward
        Anonymous Coward

        Re: I may be being stupid here but .....

        PhotoDNA was developed by Microsoft and offered to law enforcements worldwide at no cost. Yes, that's another reason to hate MSFT!

        1. Anonymous Coward
          Anonymous Coward

          Re: I may be being stupid here but .....

          PhotoDNA was developed by Microsoft and offered to law enforcements worldwide at no cost. Yes, that's another reason to hate MSFT!

          I think the hate will start when its use will be legally mandated and ISPs find that:

          1 - it may be free for law enforcement, but not for ISPs and

          2 - it requires Windows to run it on and will be as resource hungry as anything else MS has ever produced.

          If you deduce from that that I do not trust MSFT for one cent, you're right. They've played that game many, many times and they're getting desperate now.

          1. Sven Coenye
            Coat

            Re: I may be being stupid here but .....

            3 - It only works on .bmp format

            1. Anonymous Coward
              Anonymous Coward

              Re: I may be being stupid here but .....

              Looks like it will also trivially fail if the photo is rotated (e.g. turned upside down) or reflected along an axis.

          2. dogged

            Re: I may be being stupid here but .....

            @AC -

            1. It is free for both law enforcement AND ISPs. Even "service providers" such as Facebook use it. For free. It used to require hardware on-site but that's now been changed to a RESTful Azure service to remove the need for hardware and admins.

            Now somebody will claim MS are are collecting all the child porn because they like it, I suppose.

            2. See above regarding the Azure service. It requires absolutely nothing to run it.

      2. The Man Who Fell To Earth Silver badge

        Re: I may be being stupid here but .....

        Which means that once there's a motive to fool it, it probably won't be that hard to do so.

        If I were a betting man, I'd bet that if organized crime is involved (hence $), it will be fooled pretty quickly.

      3. Paul 76

        Re: I may be being stupid here but .....

        Yes, I saw a post about that after I'd posted.

        Problem is, the stuff they are trying to stop is not an issue of recognition, it is an issue of distribution.

        What they want is to stop the stuff being stored on ftp servers and so on, right ?

        So how does a "Photo DNA" algorithm cope if the thing is not actually a photo. Suppose the raw bytes in the file are swapped according to some key sequence and that key sequence is distributed seperately ?

        Anything based on pictorial recognition will not work if the thing is not a photo, surely ?

  6. Anonymous Coward
    Anonymous Coward

    Drop in the ocean

    19,000 images? That's a pitifully small number. Besides, this measure only eliminates existing, known images; it does absolutely nothing to prevent the production of new works. Maybe even the opposite: by destroying the existing stock, they could be stoking demand for new material. Great move.

    1. Anonymous Coward
      Anonymous Coward

      Re: Drop in the ocean

      It also seems that "known" means identified by one group of people but not by the courts so some of the images may be innocent and as you rightly say a lot won't be there at all.

    2. Tony S

      Re: Drop in the ocean

      "19,000 images? That's a pitifully small number."

      Beat me to it.

      I have a friend that is a professional photographer; he does weddings, portraits and such like, but he also likes to take his camera with him when he walks the dog (sadly Duster passed away a month ago). He asked me to get him some stuff to back up his picture collection; all 400,000 plus images from since he went digital.

      When I suggested that he might want to trim it down a bit just to keep the good stuff, his reply was that he doesn't have time to go through and sort them out. It's just easier to keep everything.

      1. Anonymous Coward
        Anonymous Coward

        Re: Drop in the ocean

        It is easier to keep everything... says the poor sod with 568,345 (at the last count) digital images stored away. Takes up nearly two NAS units. 15.6Tb. 980Gb so far this year.

        Well, sorting that out will give me something to do when I retire (that's my excuse and I'm sticking to it)

    3. Raumkraut

      Re: Drop in the ocean

      According to TFA, the 19,000 number is just for the "worst of the worst". The PhotoDNA wikipedia article mentions that "Project Vic" has a database in the millions of hashes.

    4. DanceMan

      Re: Drop in the ocean

      " by destroying the existing stock, they could be stoking demand for new material"

      Just as the big drug bust has been shown to create a gang war with the attendant shootings as the wannabees battle it out to become the successors to the trade. Unintended consequences.

  7. Anonymous Coward
    Anonymous Coward

    Windows 10 purpose revealed

    What happens when the criminal compresses the file, alone or as a bundle with other random files?

    On the other hand if Windows 10 scans the crim's drive before any action is taken....

    1. Anonymous Coward
      Anonymous Coward

      Re: Windows 10 purpose revealed

      They'll probably move to a Linux distro.

  8. Paul Smith

    So a hash matches, then what?

    The threat of this approach might deter the computer illiterate but then Darwin was already looking after people who try sharing kiddie porn via facebook.

    First off, how is this hash to be generated? Will google et al calculate a hash for every image before it can be uploaded and simply not accept (sight unseen) anything that produces a hash they don't like? The first time you cant upload your holiday snaps will be the last time you use their service, so that is not a runner. Any hash will have to be calculated after upload, which means the company is now in possession of the suspect image. In most jurisdictions, possession of kiddy porn (knowingly or otherwise) is a serious criminal offence and I am not sure if safe harbour rules apply if the company is aware of the content.

    What happens when a matching hash is detected? Do they send the 'suspect' image to someone else to verify? In which case they will be knowingly participating in the transport and distribution of what they believe to be kiddy porn across state and national boundaries! Try explaining that to the company lawyers.

    Perhaps they have a human verify the image before they alert the authorities? In which case they must have paid employees looking at kiddy porn on company computers, on company time, with the companies knowledge and worse, consent! I wonder how HR will fill that vacancy. "Wanted: child porn expert, equal opportunity employeer"

    If you are serious about stopping child exploitation, then stop this techno bullshit and actively support genuine child protection organisations.

    1. K

      Re: So a hash matches, then what?

      More Quango's, NGO's and charities that suck up money, with huge chunks of it going on bureaucracy and administration..

    2. e^iπ+1=0

      Re: So a hash matches, then what?

      "First off, how is this hash to be generated? Will google et al calculate a hash for every image before it can be uploaded and simply not accept (sight unseen) anything that produces a hash they don't like?"

      Generate hash on client. If it matches, silently dial 999 (or whatever) whilst not doing anything to make the (suspected) criminal suspicious. Send data to law enforcement.

  9. hi_robb

    Hmm

    There's a lot of negative comments in here and I understand the logic behind some of them. But surely trying to do something about this disgustingly vile subject is better than nothing?

    D

    1. Robert Grant

      Re: Hmm

      Negative only in some senses; half of them are mistaken about MD5, so it's fine.

      And no. Only doing something effective is worth doing. Thankfully this may well be effective, so it's worth doing.

      But in general, looking at emotional tone and saying "anything is better than nothing" aren't strong bases for analysis.

    2. nethack47

      Re: Hmm

      There's a lot of reasons why doing "anything" isn't a good idea.

      I suggest you look up scope creep or feature creep and remember what fantastic success we have had with the anti-terror laws. It does little good with no transparency what-so-ever as IWF is by design set up to stop people asking questions.

      PhotoDNA is good but it should be applied to already uploaded pictures and humans generally are required in this process to verify and IWF has their own setup for that. Hopefully they are now beying the origial 5 people and actually process false positives.

      1. Adam 1

        Re: Hmm

        > remember what fantastic success we have had with the anti-terror laws

        In other news, the latest leak from Snowden's trove was found to match one of the md5s....

        /where's that tin foil?

    3. Paul Smith

      Re: Hmm

      Doing something is only better then doing nothing if it is actually helpful. The money, time and resources being spent on this unworkable idea are money, time and resources that are not being spent on actually helping children. At its very best, if everything works properly and all the technical and legal issues are overcome, then a small number of computer illiterates who share old kiddy porn will be stopped. Or at least slowed down.

      Not one child will be protected from being exploited.

      Not one image will be taken out of circulation.

      Not one image will be prevented from getting into circulation.

      One final technical point. If the technology actually worked as advertised, why isn't it being exploited by people who could make profit from it? Where are the hundreds of millions of legitimate and copyrighted images that are being illegally used that this technology should be able to track down? Why aren't the courts being backed up with claims for compensation for provable copyright infringements? The licence fees alone for this technology should be able to fund major child protection efforts.

      1. Aqua Marina

        Re: Hmm

        "If the technology actually worked as advertised, why isn't it being exploited by people who could make profit from it?"

        It has been for some time, check out Tin Eye. Throw some images at it, and parts of images, combine a couple of images into a new image etc.

      2. 's water music

        Re: Hmm

        Not one child will be protected from being exploited.

        Not one image will be taken out of circulation.

        Not one image will be prevented from getting into circulation.

        AIUI part of the posited benefit comes from providing a means to distinguish between previously seen and new images which should allow LEOs to focus any child rescue resources onto identifying previously unknown victims

      3. Robin Bradshaw

        Re: Hmm

        "One final technical point. If the technology actually worked as advertised, why isn't it being exploited by people who could make profit from it?"

        It is

        http://www.theregister.co.uk/2015/09/09/i2600i_girds_loins_to_fight_off_copyright_troll/

    4. This post has been deleted by its author

  10. Anonymous Coward
    Anonymous Coward

    Back when images were going via usenet, the detection technique was to remove an outside frame, drop the middle to 64x64 and 16 shades of gery and then hash that.

  11. NanoMeter

    That's OK

    I'm more into straight lez porn.

  12. Rimpel

    Legal?

    >hashes will also be created from images that its analysts have sourced from reports and by "proactively searching for criminal content".

    Are they breaking the law when they find the criminal content and view the images?

    1. Martin Summers Silver badge

      Re: Legal?

      That's like saying officers on a drugs bust are committing a crime by picking up bags of heroin to use as evidence. They aren't possessing them, not in that sense anyway.

  13. Conrad Longmore

    Circumvention

    I recently looked at an issue involving fake LinkedIn profiles. I was getting nowhere with a reverse image search of the profile images with the usual technologies until somebody suggested flipping the image.. and all of a sudden the reverse image search started working.

    That was a relatively simple circumvention technique. I'm sure there are plenty of reversible techniques to apply to a picture that would screen it from this sort of detection. But it would probably catch quite a lot of this material from being circulated.

  14. Old Handle
    Big Brother

    Where's the accountability?

    I'm sure the people involved with this predominately have good intentions, but I find total secrecy around the subject disturbing. They tell us it's "child abuse images", and the worst of the worst at that. But we have absolutely no way of knowing. Obviously we can't see the images, and it doesn't sound like ordinary people can even get their hands on the PhotoDNA or the technology behind it to verify that it won't falsely match legal pictures. Nor did they tell us what exactly they intend to do if they get a match. Would it tell the uploader what happened? Would it just silently fail? Would they log the attempt and put the user on some kind of evil-list? The whole thing is too much in the dark for my liking.

  15. Andrew Jones 2

    If I were a highly cynical person - I might posture that since this prevents people from uploading ALREADY KNOWN images - there is more incentive to create new images that haven't been added to a database yet.

  16. Anonymous Coward
    Anonymous Coward

    Knowing the IWF, there's bound to be hashes of Japanese anime artwork on that list...

  17. allthecoolshortnamesweretaken

    "The Home Office Child Abuse Image Database"

    You just can't make up stuff like that. I only hope its secured better than, say, TalkTalk customer data.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon