A neural network that memorizes data ?
Now why does that sound familiar ?
Private information can be easily extracted from neural networks trained on sensitive data, according to new research. A paper released on arXiv last week by a team of researchers from the University of California, Berkeley, National University of Singapore, and Google Brain reveals just how vulnerable deep learning is to …
This isn't really too surprising, but I'm not sure if 'remembering' is really the correct term.
I recall an earlier story here on El Reg where someone took an AI trained to identify cats in photographs and then got it to generate a picture from the features it had learned and which it used to make identifications. The picture was undoubtedly 'cat' but wasn't a picture of 'a' cat - iirc, there was no background to the picture, just edge-to-edge 'aspects' of 'cat', seamlessly blended together.
Now if that can be done with imagery as complicated as a cat then it seems likely that generating valid strings of numbers from an AI trained to identify valid strings of numbers shouldn't be beyond the bounds of possibility.
I was thinking "But... but... but..." until the final paragraph: "never feed secrets as training data".
Quite, it's live data, not training data. The clue is already in the terminology.
In fact, did my sensitive data sign up to be used for training purposes? I think not.
Humans in particularly sensitive activities[1] have long operated on a need-to-know basis.
Occasionally we hear of a dog or other animal being killed because it knows too much and would reveal something to an enemy.
If the "I" in AI is to mean anything, we're into the same situation.
[1] Including some that are sensitive only because the glare of publicity would reveal monumental waste of taxpayer funds, and such things.
They are really just a specialist type of database. No surprise that what someone puts in can somehow be "read" by someone else.
We've had SQL for ages and look how often NEW plugins for Web Applications allow data extraction?
A.I. boffins, shocked and surprised by latest observations, effectively announce that they actually don't really understand what's going on inside neural networks.
The phrase "haven't got the first clue yet" springs to mind.
Is there an explanation for their apparent dough-headedness?
"don't really understand what's going on inside neural networks"
I really wonder about this. Back in my days working on knowledge capture systems, one of the requirements was an 'explain' function. Which rules were fired and which instances of training data[1] supported each rule. Even better, we could derive a veracity score for each data set source and up/down rank them based upon their contribution to eventual successful solutions. Humans, on the other hand, tend to generalize their training (which can be good). But then forget exactly why they came to certain conclusions.[2] Computers don't forget anything.
[1]Admittedly, we had a small set of data compared to what Google has to deal with. So there was a copy on our server.
[2]The book 'Blink' by Malcolm Gladwell outlines the development and operation of this largely subconscious process.
There is no such thing as non-confidential personal data. All data is confidential because it can be combined to fill in whatever is missing. Much of the 'anonymised' data sold by Google et al simply has names removed. But if you correlate it with another source, you can easily de-anonymise it. E.g. location data can be cross referenced with publicly available names and address then simply searched for where you go at the end of each day. Similar techniques can be used to extract full credit card numbers from just the last four digits. AI is already doing this in an unstructured way so it's inevitable that its memory will contain confidential information even if it isn't fed any. The larger the database, the sooner this happens.