Sexism, racism, and other forms of discrimination are being built into the machine-learning algorithms that underlie the technology behind many 'intelligent' systems that shape how we are categorized and advertised to.
Biases and blind spots exist in big data as much as they do in individual perceptions and experiences. Yet there is a problematic belief that bigger data is always better data and that correlation is as good as causation.
When dealing with data, scientists have often struggled to account for the risks and harms using it might inflict. One primary concern has been privacy - the disclosure of sensitive data about individuals, either directly to the public or indirectly from anonymised data sets through computational processes of re-identification.
The promoters of big data would like us to believe that behind the lines of code and vast databases lie objective and universal insights into patterns of human behavior, be it consumer spending, criminal or terrorist acts, healthy habits, or employee productivity. But many big-data evangelists avoid taking a hard look at the weaknesses.
Books about technology start-ups have a pattern. First, there's the grand vision of the founders, then the heroic journey of producing new worlds from all-night coding and caffeine abuse, and finally, the grand finale: immense wealth and secular sainthood. Let's call it the Jobs Narrative.
Algorithms learn by being fed certain images, often chosen by engineers, and the system builds a model of the world based on those images. If a system is trained on photos of people who are overwhelmingly white, it will have a harder time recognizing nonwhite faces.
Data and data sets are not objective; they are creations of human design. We give numbers their voice, draw inferences from them, and define their meaning through our interpretations.
It is a failure of imagination and methodology to claim that it is necessary to experiment on millions of people without their consent in order to produce good data science.
If you're not thinking about the way systemic bias can be propagated through the criminal justice system or predictive policing, then it's very likely that, if you're designing a system based on historical data, you're going to be perpetuating those biases.
If we start to use social media data sets to take the pulse of a nation or understand a crisis - or actually use it to deploy resources - we are getting a skewed picture of what is happening.
People think 'big data' avoids the problem of discrimination because you are dealing with big data sets, but, in fact, big data is being used for more and more precise forms of discrimination - a form of data redlining.
Histories of discrimination can live on in digital platforms, and if they go unquestioned, they become part of the logic of everyday algorithmic systems.
The amount of money and industrial energy that has been put into accelerating AI code has meant that there hasn't been as much energy put into thinking about social, economic, ethical frameworks for these systems. We think there's a very urgent need for this to happen faster.
We urgently need more due process with the algorithmic systems influencing our lives. If you are given a score that jeopardizes your ability to get a job, housing, or education, you should have the right to see that data, know how it was generated, and be able to correct errors and contest the decision.