Back in 2019, a regional court in China started using an AI called Xiao Zhi. Originally its purpose was just to announce judicial proceedings, but it eventually moved up to settling loan disputes, and can now even record testimonies and analyze materials in the case. Something that’s particularly appealing about AI (or online) courts is the fact that, in a sense, that standardizes outputs. Given that it’s all handled by the same AI, the input and output tree –so to speak– remains the same. Yet that very strength also carries a glaring weakness. For instance, there have been cases where (in cases of AI being used to filter applications for a job) someone gets filtered out automatically, and just by tweaking the date of birth that same resume is suddenly worthy of an interview. An even more egregious example is an AI that was trained on resumes already at the company, and so one indicator it developed for who was a “good fit” was whether they mentioned basketball as a hobby or not. Even worse, women, who were more likely to list liking softball instead, were marked down by the AI. The reason it learned this correlation is because it looked at the more successful staff (who tended to be men, which is already a symptom of how society tends to be skewed towards men’s favor), and picked up on the fact that men were more likely to list basketball, and women more likely to list softball, and weighed resumes accordingly.
This is all ultimately brought about by the fact that AI is a predictive tool. Its whole purpose is to analyze a dataset of inputs and outputs, look for patterns, and do its best to replicate that pattern of inputs and outputs. But this means that if, say, a criminal justice system disproportionately imprisons people from certain minority groups (such as is the case with African Americans, and Native Americans), an AI can easily replicate those patterns. For example, a factor that goes into sentencing is how likely the judge believes the defendant is to commit more crimes after being released. A recent report found that an AI meant to calculate that risk was twice as likely to mislabel a Black person as a likely reoffender as compared to a white person. Interestingly enough, judges are not supposed to use these scores to determine the length of a sentence, but rather to determine the type of sentencing (prison vs probation, or treatment programs). However, it still happens. Conceptually the appeal is clear, the "objective" tool that is AI has determined someone a risk to the community, and so a harsher sentence has been doled out in response. That's all predicated, of course, on the AI being correct.
The worst part is that, beyond inferring criteria such as in the basketball/softball example (which is actually a great example as to why simply "blinding" the AI to gender or race wouldn't work, as it would just use other indicators), AIs are black boxes. Neural networks are so unbelievably complicated that it’s impossible to determine just what goes into processing an input, and so functionally all anyone has is the patterns between inputs and outputs. This means that, given that AI isn’t smart, it could be picking up on completely erroneous correlations that happen to replicate data decently enough. But then that means that when edge cases come up (aka minorities), those false relationships can cause serious harm.
Despite that, AIs retain great potential, which is why the technology has exploded in popularity in recent years. However, it must be deployed with great care. It is far too easy too re-entrench systemic biases within these systems, and through that do a great deal of unintended harm. And of course new laws and policies are being developed, as well as frameworks under which to develop and deploy AI, but given how quickly AI has progressed proper regulation is bound to lag behind for at least a few years.
Comments