Hmmmm....
In the 1970s I worked for a bloke whose smile was upside-down.
Google's taken a small step towards addressing the persistent problem of bias in artificial intelligence, setting its boffins to work on equal-opportunity smile detection. In a paper published at arXiv December 1, Mountain View trio Hee Jung Ryu, Margaret Mitchell and Hartwig Adam laid out the results of research designed to …
"The paper stated that Google is not seeking to classify people by race (since that's both unethical and arbitrary), and the authors noted that using AI to classify race or gender needs the individual's consent. Nonetheless, training race and gender recognition into the model is necessary if the AI is going to reliably identify a smile..."
So, I take it that Google really will be divining the race/gender of subjects, but it's okay because the information will be used only for innocuous things like detecting smiles, and then will be purged from the system. No harm no foul.
Except, Google has a considerable track record of collecting more data than was strictly ethical. Can we assume they won't 'cheat' in this case? Especially with such pertinent (and valuable) data?
I'd like to believe, but I need help here.
Except Google is a big company and not all arms will act in the same way. Also they tested it on publicly available samples and analysed false positive and negative data from their own users. IOW failure rates. Everyone collects those or tries to in order to improve the user experience.
It hasn't really a solution as even "Neural Nets", Deep or Machine Learning are ultimately a special sort of database as model "trained" with human curated data.
At the end of the process it will have the same biases as an MS-SQL database and VB form loaded via human data collection and entry.
All systems will reflect the choices of the humans that commissioned them.
Argh! If your training data is biased, then you model will be biased.
If you have a heavily biased dataset, and a less biased one, you don't build your model from biased one and then use the less biased one to teach it to correct. You just use the less biased one....
What seems to happen is that the carefully devolved (and already sold and in use by the LEOs) simply doesn't work on the less biased data. Lots of hilarious examples abound, like matching gorillas to black people, and models that have such reliance on skin tone that being too white, black, orange or too much Instagram filter render the matching pointless.
The much less funny aspect is that LEOs are already using this technology, then lying about it. It's been termed "evidence laundering", using a new (and legally untested) technology to make a match, then claiming it was done with a traditional (and accepted) method. In this case, facial recognition is claimed to in fact been a trawl through mugshots.
I personally don't object to this being used as an investigative tool to try and identify an unknown suspect, but what worries me is that it's being used as the sole identifying evidence for convictions. An undercover LEO snaps a couple of cellphone pics, software says it's you, bish bash bosh 8 years for drug dealing.
The word "Bias" seems to mean more than one thing within this article. There is the desire to ensure that facial recognition systems perform equally well on non-white faces, a problem which can presumably be solved by focusing training on such under represented faces, as the article suggests.
The second, more interesting usage is bias in terms of subjective qualities, for example the neural network that measures women's beauty. A subjective quality like beauty is inherently "biased" - being in the eye of the beholder, and all that. However that does not mean that meaningful statistical predictions cannot be made. The insurance industry is built upon such statistical modelling, after all. An insurance company will be "biased" against an 18 year old man from a poor estate with a turbo charged car, when he attempts to get car insurance - even if he is a safe driver. So sometimes bias is accepted, but when it comes to a machine judging women's looks, it is considered a problem.
I'm not sure why this is, my only thought is that it's not the application itself, but the perceived reason behind it getting written, (geeks getting uppity, judging higher status women with their technological wizardry, doubtless cackling and rubbing their hands as they do so). I think the collision between big data +AI prediction and modelling, and "right thinking" people's beliefs is going to be highly popcorn worthy.
I'm pretty sure in this case the "bias" in this case (and most facial recognition software in the west) is that the training set contains mainly white faces. Thus it's trained to differentiate between whiteys, but since it's been trained on very few black/latino/asiatic faces, it poor at differentiating between them.
Quite a few manufacturers of said software will talk happily about how accurate it is on test data, but when given real world data, like photos other than straight on to camera, or people wearing more/less makeup, they can be more circumspect. When you can do tests (and they are strangely reluctant to do this) and it gives a 90% likelihood of *any* two black fellas being the same person, it can cause real problems.
"An insurance company will be "biased" against an 18 year old man from a poor estate with a turbo charged car, when he attempts to get car insurance - even if he is a safe driver."
Um, well, no. You may have missed it, but you cannot discriminate upon protected characteristics* even if they are statistically valid. So that it's a chap and 18 years old are *specifically* prohibited from being allowed in your consideration, and you should be required to prove your algo also doesn't discriminate on that basis. You can judge on car model/make/age, persons income, amount of driving experience, past accidents, past claims etc. It's been suggested that if you allow the insurance company to see your driving data (as collected by your car) then they can use that to judge how safe you actually are.
It's also impossible to describe anyone with less than 10,000 hours behind the wheel as safe, since there is simply not enough data to base that on. If said chap was a kiwi, had been driving since he was 15 (unless that's been changed), had passed a defensive driving course and his job requires a licence then perhaps.
* gender, race, religion, age, sexual orientation and membership of political organisations.
"Biased models have become a contentious issue in AI over the course of the year, with study after study documenting both the extent of algorithmic bias, and the real-life impacts such as women seeing ads for low-paying jobs and African-Americans being sent more ads about being arrested."
I just received a targeted ad for a gallon of coyote urine (and no, that's not a euphemism for mass produced American beer). I wonder if this was the result of model bias or just my weird browsing habits (maybe a bit of both?).