Even synthetic datasets that companies create artificially inherit the skewed worldview of real-world datasets
Removing bias from humans is hard enough, keeping algorithm bias-free is an entirely new challenge because biases are largely unintentional.
The origin of the word ‘bias’ has never been quite certain. Linguists reckon that the antecedent of bias is the Old French word ‘biasi’ which meant at an angle or oblique. It came to mean ‘a one-sided tendency of the mind’. In the old English game of bowls, the ball had asymmetrical weight or bias, which made it roll in a curved line. This is how bias came to be the favoured word for having a disproportionate weight in favour of or against an idea or person. Bias is about differential and unjust interaction and treatment.
Our deep-seated biases have now spilled into the technology domain and contaminated AI algorithms, which have amplified conflicts and hatred online. While algorithms are learning to recognise the pixels on the contours of a human, they are also picking up on prevailing biases about the human. The outcome could be outrageously unfair like the arrest of an innocent man. Biased algorithms erode our choices in online content and advertisement consumption. Two people living in the same house could involuntarily develop very divergent and extreme world views that are shaped by algorithmic bias.
Bias occurs when machine learning algorithms pick up socio-economic ideologies from its training data. If the dataset is a true representation of the real-world, we are bound to get algorithmic bias, and the resultant unjust decisions. If you set a web crawler to crawl the entire Internet and learn from the datapoints, it will pick up on all our biases. It is almost impossible to find large sets of training data that are devoid of bias.
Criminal risk assessment algorithms that calculate a recidivism score are known to be biased. They are trained on crime data drawn from arrests and convictions, often heavily skewed against a community. Open training datasets such as ImageNet are Americentric and Eurocentric.
Bias can be created by unpredictable correlations in large datasets or when real-world responses are fed back into the algorithm. Algorithms are used in new contexts. A sneaky algorithm could even use proxies. To settle lawsuits against discriminatory advertisements, Facebook modified the algorithm for its new ad portal so that its ads did not explicitly discriminate against protected groups. However, it still uses criteria like page visited or products purchases that are proxy characteristics for discrimination.
Can AI be made fairer? Engineers try to add more data on underrepresented geographies to remove the bias. Even synthetic datasets that companies create artificially inherit the skewed worldview of real-world datasets. The company Mostly AI found that in the US census data, the number of women with annual income above $50,000 was 20 per cent less than men in the same income bracket. It adjusted its data generator by applying a penalty to force near parity between the genders.
Fixing bias could be like playing a game of whack-a-mole, yet it has to be done. While peer review of outputs could help to test the underlying data, this need not be effective because of our implicit biases. A 2012 study revealed that US doctors were more likely to prescribe painkillers to their White patients and not Black patients, without realising that they were doing so.
There is need for more transparency and regulatory oversight. Whistle blowers with credible information about systemic and blatant negligence of algorithmic bias must be protected by regulatory bodies. Last year, the US senate introduced a bill called Algorithmic Accountability Act, which will provide Federal Trade Commission the teeth to mandate companies under its jurisdiction to run impact assessments of ‘high risk’ automated decision systems.
Organisations will become prolific producers of machine learning algorithms, which needs to be brought under the ambit of governance, risk and compliance. Companies will be obligated to audit the training datasets and algorithmic outcomes based on severity of unfairness.
Can algorithms be neutral when humans have summarily failed to be such? Removing bias from humans is hard enough, keeping algorithm bias-free is an entirely new challenge because biases are largely unintentional. Yet the argument that algorithms mirror society and so cannot be fixed is tenuous because they have so much influence on our lives. AI algorithms must positively distort our flawed reality.
Shalini Verma is CEO of PIVOT technologies