This is the third and final post in a blog series on how I think about bias in the context of Machine Learning (ML) and Artificial Intelligence (AI). In the first post, I wrote about the distinction between overt and covert bias. In the second post, I wrote about why AI models, including those powering search engines, exhibit social biases such as sexism, racism, and ableism. In this post, I identify four things at the heart of the problem of AI bias and present the “Humanist AI Manifesto.”
I’m a proud graduate of Carnegie Mellon University (Go Tartans!). However, I found the panel discussion at a recent alumni event disappointing and, to be honest, infuriating. While I genuinely enjoyed hearing from two of the panelists about their work with AI for research on clean energy and their use of predictive analytics to inform policymaking in the US, the other panelist, whose background is in ML, repeatedly made huge claims without providing evidence to back them up. There were two claims that seemed particularly silly to me:
1. In 20 years, AI will replace all the work we, the people in the audience, do.
Yet the panelist had no information about what jobs we the audience perform. I wasn’t asked to fill out a survey about the type of work I do. This panelist also did not mention having any interdisciplinary research or work experience, unlike the other two panelists, so I have trouble seeing how they could even reasonably infer this.
2. Recent advances in AI is humanity’s most significant innovation.
How self-serving! How incredibly biased! Of course the panelist is going to think that, it’s that panelist’s area of research AND that panelist is on the board of OpenAI. More importantly, though, unlike the other two panelists, this claim was made with literally no evidence to back it up. Has AI reduced world hunger? Eliminated poverty? Ended the climate crisis? No. AI certainly has the potential to help us (the humans) achieve these things. But it’s not going to do this simply because it’s AI.
Luckily, soon after that panel, I came across two articles pushing back against the idea that AI is the best thing ever and should be used everywhere and will be used everywhere. Thank goodness! I thought. More people are pushing back against this technochauvinist (to use Meredith Broussard’s term) narrative.
Still, a lot of what is said (not all, but a lot) explains why we need to think about AI differently without providing much direction on what that different way of thinking could be. And I have ideas!!! These are the four “recalibrations” I call for in my Ph.D. thesis (see Chapter 2, Section 2, starting on page 23), which I’m laying out here as a Humanist AI Manifesto.
(Note: “AI” and “ML” are different things that have gotten conflated, which I personally think is because “AI” sounds cooler so it serves the people working on ML well to make it seem like AI is being used in more places than it actually is. For the rest of this blog post, I use the “AI” in this way, as a broad term that encompasses ML, though I do think this is backward).
Quality over Quantity
The AI field’s unquestioning focus on quantity is sacrificing quality. This tunnel vision has led some AI researchers and practitioners to make claims that can’t actually be backed up with evidence. For example, a claim was made that “bigger is better:” larger quantities of data and larger quantities of parameters (a type of input for an algorithm, sort of like variables in an equation) resulted in better AI model performance. The thing is, no one had actually tested for anything else. No one had run experiments to see if changing anything else about the way AI models were built could lead to better performance. So really, it’s not all that surprising that researchers (like here and here) have been able to demonstrate that bigger is not, in fact, always better.
The problem I still see in the field, though, is how “better” is determined. The way AI models’ performance is evaluated relies largely on quantitative measures. For some reason, in AI, evaluating models for bias is different than evaluating models’ performance. So a model might have a really high quantitative performance score but display a shocking level of racism (that’s what these researchers found). Shouldn’t racism be an indication of poor model performance?
We need an AI field that prioritizes quality more than quantity. We need to use qualitative methods to evaluate AI models and AI-powered technologies. We need to pay more attention to the quality of the training datasets that teach AI models how to behave.
Accuracy over Efficiency
The AI field also unquestioningly prioritizes efficiency. Automate everything! The faster the better! But if AI has low quality, automating everything and moving faster means the model will be rapidly making mistakes…
An epidemiologist friend of mine (she does data analysis for the public health sector) told me about a meeting with a couple people from her organization’s tech team where the techies started out by asking what my friend and her colleagues wished they could do more efficiently. What task do you wish you could speed up because it’s a huge time suck, so if you could perform the task faster, you could do your job better? they asked. My friend and her colleagues named things. But the techies were disappointed. The things they were naming weren’t relevant to the AI model that they’d planned to demo my friend and her colleagues. So the techies ended that conversation and started talking about the Large Language Model (LLM) they’d built. To demonstrate the high accuracy of the LLM, the techies said it had passed an exam for epidemiologists with a higher score than humans typically do.
Here’s the kicker: my friend and her colleagues had never heard of the exam. The techies had found this exam online and assumed it was a good measure of what my friend and her colleagues’ jobs involved, so they used it to evaluate how accurate the LLM was. It was an efficient way to test an AI model, but it didn’t actually make sense for how the AI model was intended to be used.
Don’t get me wrong, efficiency is certainly important to consider. More efficient AI means less energy-intensive models. Efficiency in achieving certain tasks is incredibly useful. But efficiency is worthless if we’re efficiently making miscalculations. We need an AI field that thinks more about accuracy than efficiency, where accuracy does NOT mean accuracy as measured with the commonly used quantitative metric, but means accuracy relative to the real world. We need to think about accuracy before we think about efficiency.
Representativeness over Convenience
So I’ve talked about quality and accuracy, but what does that actually look like? Obviously there are people in the AI field who think they have been considering quality and accuracy. What’s been missing is the consideration of representativeness.
Think about introductory statistics. To be able to reasonably draw conclusions about a population, you need a representative sample of that population. In the AI field there’s no established approach to creating training datasets for models that are representative samples of the data that the models are intended to analyze or generate. This is hugely problematic. You might think that after Amazon’s hiring algorithm was found to be eliminating all job applications from women, or after automatic soap dispensers were found to not recognize dark skin tones, that the AI field would have realized it needed to spend more time thinking about how to create training and test datasets that are representative of the people who would use or be impacted by the models. Unfortunately, that’s not the case. There are people in AI who recognize this (thank goodness!), but sadly not enough (yet!) to have fully redirected the field.
So, instead of using data that’s convenient to collect (a.k.a. data that’s available online), we need data that’s representative of what an AI model is meant to do. For example, I wanted to study sexism and gender bias in the descriptions of historical and cultural artifacts collected in an archive. Manifestations of sexism and definitions of gender and bias are culturally, temporally, and linguistically specific, so I used data from the archive I was studying.
Of course, it’s so much easier to use data that is already digital and online, and it’s so much more exciting (apparently) to innovate on AI models than to produce new data to build representative datasets. If you ask me, however, that’s a poor excuse for paying such little attention to data. How many examples will people have to give of the real world harms these models are causing before the AI field accepts that its approach is flawed? (Has there really been evidence of harm caused by AI? you may ask. Yes. Here’s a shortlist: algorithmic colonization, endangering women’s health and well-being by deprioritizing research on their bodies, punishing people because of their skin color, exacerbating poverty, firing highly regarded teachers, and stifling innovation and limiting GDP.)
Situated Thinking over Universal Thinking
Why aren’t more people in the AI field thinking about representativeness? I haven’t (yet) done the qualitative research with AI researchers and practitioners to investigate this, however I have read many, many publications about AI models and AI-powered technologies. Based on this, my guess is that the root of the problem is a flawed underlying assumption: the assumption that universal knowledge is a thing, the assumption that an AI-powered solution developed in one place is universally applicable to all places. I talk about why this assumption is flawed in the previous post of this blog series, so I won’t repeat myself. Instead, here I want to present an alternative: situated thinking.
Imagine if we approached AI research and development bottom-up, instead of top-down. Rather than creating an AI model in one context and releasing it upon the world for use in any conceivable context, what if we created AI models with more specific use cases? This would be both less risky and more efficient. By developing an AI model for a very specific use case, say cancer detection, we could focus on making sure that people of a wide variety of genders, skin tones, body types, ages, etc. are covered by the model. At the same time a separate area of work could be focused on developing an AI model to identify obstructions in the road that might endanger drivers to improve the safety self-driving cars. Once that’s sorted, we could compare these two types of computer vision models to see where there are similarities and differences. Is the model for self-driving cars more efficient? If so, can the model for cancer detection be modified to have the same efficiency without sacrificing the quality of the model’s prediction? If so, great! If not, thank goodness we didn’t assume the self-driving car’s AI model was universally applicable!
TL;DR
To sum up, the Humanist AI Manifesto states that when developing AI, we should prioritize:
Quality over quantity in dataset creation and evaluation methods (qualitative methods should complement the more commonly used quantitative methods);
Accuracy over efficiency, meaning accuracy relative to real-world use cases, not accuracy as in the quantitative metric;
Representativeness over convenience in dataset creation and performance evaluations; and
Situated thinking over universal thinking in how we frame problems, how we create datasets and models, and how we evaluate datasets and models.
Acknowledgments
Thank you to Marika Cifor, Patricia Garcia, TL Cowan, Jasmine Rault, Tonia Sutherland, Anita Say Chan, Jennifer Rode, Anna Lauren Hoffmann, Niloufar Salehi, and Lisa Nakamura who authored the Feminist Data Manifest-No; Valerio Basile who authored the Perspectivist Data Manifesto and Giorgia Lupi who authored the Data Humanism – A Visual Manifesto for inspiring me to write this Humanist AI Manifesto.
Leave a Reply