Mitigating bias in AI
Humans in the Loop is a firm and passionate advocate of ethical AI. As a member of the AI ecosystem and an important link in the AI supply chain, we recognize our role in ensuring that computer vision solutions are built and used in an ethical way.
As a supplier of dataset collection and annotation, we are focused on helping build AI that is fair, transparent, explainable, and trustworthy. We understand the importance of models that are bias-free and and are dedicated to supporting and advising our clients in this area.
As part of this effort, we have published a two-part whitepaper series to raise awareness of the issue of bias in computer vision and to provide practical examples on how to avoid it based on our own hands-on experience.
Bias in dataset collection
The first part of the series covers dataset collection as the first key stage where bias can seep into AI models. We discuss common dataset collection practices in the computer vision community, especially those that were used for the large-scale image datasets like ImageNet which have powered the AI revolution in recent years.
This paper further delves into why bias is an issue by discussing the “coded gaze” in AI and the fact that no human-created system is fully objective. We showcase a variety of examples of what happens when things go wrong, but we also strive to understand how such bias came to appear in the first place.
Based on our hands-on experience of collecting thousands of images for computer vision datasets, we suggest best practices for ensuring fair representation to limit the harmful prejudices that could be absorbed by an AI model. We look at the question of dataset diversity from a number of different perspectives, including gender, race, economic, and geographic diversity, and we suggest ways in which skewed datasets can be balanced.
Bias in dataset annotation
The second part of the series focuses on dataset annotation and the importance of iterations in order to avoid model drift and to make sure that models are correctly handling real-life data.
The data labeling process is a crucial step in which the creators of the AI model determine what images actually mean. By discussing the “politics of labeling” in a critical way, we discuss how notions can be made “visible” or “invisible” by the simple definition of a labeling class taxonomy. We take a deep dive into the process of “taxonomization” and the potential pitfalls: missing classes, overlapping classes, non-imageable classes, etc.
In this whitepaper, we give practical tips on how to bias-proof your labeling procedure and how to map out and avoid potential labeler biases, whether you are using crowdsourcing or managed labeling teams. We also give suggestions about how to handle two very sensitive domains: labeling gender and race. Finally, we discuss how iterations and audits can help identify bias and deal with it in a continuous way in order to ensure bias-free AI.
Interested in reading our whitepapers? Check out the download links below!
Bias in data collection
How to avoid bias in computer vision AI through better dataset collection
Bias in data labeling
How to avoid bias in computer vision AI through better dataset annotation