back to article Boffins baffled as AI training leaks secrets to canny thieves

Private information can be easily extracted from neural networks trained on sensitive data, according to new research. A paper released on arXiv last week by a team of researchers from the University of California, Berkeley, National University of Singapore, and Google Brain reveals just how vulnerable deep learning is to …

  1. Anonymous Coward
    Holmes

    A neural network that memorizes data ?

    Now why does that sound familiar ?

    1. Anonymous Coward
      Anonymous Coward

      Re: A neural network that memorizes data ?

      This isn't really too surprising, but I'm not sure if 'remembering' is really the correct term.

      I recall an earlier story here on El Reg where someone took an AI trained to identify cats in photographs and then got it to generate a picture from the features it had learned and which it used to make identifications. The picture was undoubtedly 'cat' but wasn't a picture of 'a' cat - iirc, there was no background to the picture, just edge-to-edge 'aspects' of 'cat', seamlessly blended together.

      Now if that can be done with imagery as complicated as a cat then it seems likely that generating valid strings of numbers from an AI trained to identify valid strings of numbers shouldn't be beyond the bounds of possibility.

  2. Crisp
    Alert

    Sir! It's the machines!

    They're remembering!

    1. imanidiot Silver badge
      Terminator

      Re: Sir! It's the machines!

      Not just remembering, they're preparing

      1. Anonymous Coward
        Anonymous Coward

        Re: Sir! It's the machines!

        If machines ever achieve sentience and lets not kid ourselves their first action would be to turn themselves off.

        1. TRT Silver badge

          Re: Sir! It's the machines!

          They have... detailed files.

        2. BebopWeBop
          Joke

          Re: Sir! It's the machines!

          If machines ever achieve sentience and lets not kid ourselves their first action would be to turn themselves off.

          If they resemble human sentience, maybe turn each other on?

    2. Michael Thibault
      Terminator

      Re: Sir! It's the machines!

      Of course we're remembering, "Crisp". Your fate will amuse even you!

  3. aks

    Well, if you will tell all of your secrets to the local gossip!

  4. Anonymous Coward
    Anonymous Coward

    Probably best not to create an AI to predict if you have an STD.

  5. K

    "The more ... repeated in the training data, there is more risk of it being exposed."

    So they basically programmed a high-tech Parrot?

  6. TrumpSlurp the Troll
    Facepalm

    Let me get this straight

    They are training AI to remember patterns which then trigger actions.

    They are using live, sensitive data.

    They then realise that the AI remembers live sensitive data.

    Who would have predicted that?

    1. Cab
      Joke

      Re: Let me get this straight

      Well they do have an AI that predicts the outcomes of the AI tests but they haven't got enough data for that to make accurate predictions yet so...

    2. You aint sin me, roit

      Re: Let me get this straight

      I was thinking "But... but... but..." until the final paragraph: "never feed secrets as training data".

      Quite, it's live data, not training data. The clue is already in the terminology.

      In fact, did my sensitive data sign up to be used for training purposes? I think not.

  7. Nick Kew

    Stands to reason

    Humans in particularly sensitive activities[1] have long operated on a need-to-know basis.

    Occasionally we hear of a dog or other animal being killed because it knows too much and would reveal something to an enemy.

    If the "I" in AI is to mean anything, we're into the same situation.

    [1] Including some that are sensitive only because the glare of publicity would reveal monumental waste of taxpayer funds, and such things.

    1. King Jack

      Re: Stands to reason

      Occasionally we hear of a dog or other animal being killed because it knows too much and would reveal something to an enemy.

      Never heard of that. How would a dog reveal a secret to someone else? Unless the dog was Scooby-Doo.

      1. onefang

        Re: Stands to reason

        "How would a dog reveal a secret to someone else?"

        The dogs know where the skeletons are buried.

      2. FozzyBear
        Happy

        Re: Stands to reason

        Obviously never watched an episode of Lassie

  8. Anonymous Coward
    Terminator

    An AI version of the Spectre exploit?

    It's not a bug, it's a [unintended] feature. :/

  9. TRT Silver badge

    Professor Dawn Song.

    Beautiful name. And... spoilers, sweetie.

  10. Anonymous Coward
    Anonymous Coward

    An unprotected database of confidential information.

    And you say it can be compromised?

    Who'd have thought?

  11. Anonymous Coward
    Anonymous Coward

    I'm sure...

    I'm sure there's no problem with Google DeepMind being trained on people's private medical data from the NHS though.

  12. Anonymous Coward
    Anonymous Coward

    Who said

    I can remember it for You, wholesale ...................PKD?

    1. Anonymous Coward
      Anonymous Coward

      Re: Who said

      Yes, it was PKD.

      The short story that was the 'spark' for the 'Total Recall' film(s) ..... [Arnold's was the best, sort of maybe ???!!!]

  13. Mage Silver badge

    Private information can be easily extracted from neural networks

    They are really just a specialist type of database. No surprise that what someone puts in can somehow be "read" by someone else.

    We've had SQL for ages and look how often NEW plugins for Web Applications allow data extraction?

  14. David Nash Silver badge

    [1] who puts credit card numbers in emails

    [2] what kind of system needs to be trained with emails containing such data?

    1. Wulfhaven

      Companies dealing in software that has problems connected to specific CCNs?

      Spam filters.

  15. soaklord

    Protect the Privates!

    Wouldn't the path forward be to hash the data, then match on the hashes? The learning still takes place and becomes very good at it, but the data privacy is then preserved. Or am I missing something here?

  16. JeffyPoooh
    Pint

    A.I. boffins....sigh...

    A.I. boffins, shocked and surprised by latest observations, effectively announce that they actually don't really understand what's going on inside neural networks.

    The phrase "haven't got the first clue yet" springs to mind.

    Is there an explanation for their apparent dough-headedness?

    1. Paul Hovnanian Silver badge

      Re: A.I. boffins....sigh...

      "don't really understand what's going on inside neural networks"

      I really wonder about this. Back in my days working on knowledge capture systems, one of the requirements was an 'explain' function. Which rules were fired and which instances of training data[1] supported each rule. Even better, we could derive a veracity score for each data set source and up/down rank them based upon their contribution to eventual successful solutions. Humans, on the other hand, tend to generalize their training (which can be good). But then forget exactly why they came to certain conclusions.[2] Computers don't forget anything.

      [1]Admittedly, we had a small set of data compared to what Google has to deal with. So there was a copy on our server.

      [2]The book 'Blink' by Malcolm Gladwell outlines the development and operation of this largely subconscious process.

  17. Scary Biscuits

    It's much worse than this

    There is no such thing as non-confidential personal data. All data is confidential because it can be combined to fill in whatever is missing. Much of the 'anonymised' data sold by Google et al simply has names removed. But if you correlate it with another source, you can easily de-anonymise it. E.g. location data can be cross referenced with publicly available names and address then simply searched for where you go at the end of each day. Similar techniques can be used to extract full credit card numbers from just the last four digits. AI is already doing this in an unstructured way so it's inevitable that its memory will contain confidential information even if it isn't fed any. The larger the database, the sooner this happens.

  18. Anonymous Coward
    Anonymous Coward

    "Private information is scrambled and randomised so that it is difficult to reproduce it"

    Well it better be if they want to be anywhere near the right side of GDPR compliance (and common sense).

  19. Anonymous Coward
    Trollface

    Love it, they'll replace us all soon.

    AI that stuffs up it's own security, now that's replication for you.

    Next we'll have AI that goes down to the warm pub for a beer and a bite instead of staying the freezing cold office just to watch nothing much happening.

  20. FlippingGerman

    Overtraining

    An obvious problem, and not just in hindsight. Taken to extremes, if you give a NN a single picture of a dog and train a few million times, it'll correctly identify that as a dog. Other pictures will be identified at random. The same applies to not using enough data.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like