Not Accounting for Sensitive Factors Doesnât Mean Your Algorithm Wonât be Biased
Being colorblind doesnât mean that color doesnât exist. Similarly, not including sensitive factors such as race and sex into algorithms doesnât mean the algorithms wonât carry biases formed on race or sex. Those biases are ingrained into society, hence the data. Most algorithms are literal; their outputs are a function of the patterns they observe.
Nonetheless, a common technique that developers have applied is straight omission despite its continuous failure. Kwok from Yaleâs School of Management explains when race is removed from racially biased algorithms, a subtler biased âlatent discriminationâ is introduced where other factors, such as income or location that are correlated with race, essentially serve as proxies for race. The Harvard Business Review also investigated an employment recruitment scenario and found that proxies could predict gender with 91% accuracy in data.
The omission strategy extends beyond just individual scenarios, though. During a recent conference on AI Regulation at California Western School of Law, a French panelist noted that France doesnât have to deal with the racial bias issue since they simply do not collect race as a factor. This is due to the GDPR, which prohibits the use of âspecial categories of dataâ (Article 9). This includes sensitive factors as well as proxies that may reveal them. It is phrased as follows:
Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural personâs sex life or sexual orientation shall be prohibited.
Countries subject to the GDPR, such as France, still have racial biases. They are just unable to be measured since the data is never collected. However, one could argue that perhaps biases donât need to be âfixedâ since our algorithm should reflect real life. When ProPublica criticized the maker of COMPAS, a recidivism algorithm that found black defendants to be nearly twice as likely to be classified as high risk compared to their white counterparts, the algorithm maker and researchers responded that it was mathematically impossible to have an algorithm that didnât result in such racial gaps due to the impact of race on the recidivism.
This reasoning is problematic since algorithms can amplify and perpetuate biases. For example, predictive policing tends to drive law enforcement to black and brown areas based on past data. However, the past data is biased based on heightened racial tensions, and increased law enforcement in such areas increases arrests, skewing future data and increasing the racial disparity among arrestees.
We need a solution to prevent algorithms from perpetuating cycles of existing biases, and simply ignoring sensitive factors only masks the issue. The U.S. lacks a regulatory framework that allows organizations to measure and mitigate their own bias. The White House Office of Science and Technologyâs AI Blueprint outlines thorough recommendations for best practices. However, the lack of enforcement undermines the Blueprintâs effectiveness, as evidenced by the harmful impact of biased algorithms being deployed. Since sweeping bans such as the GDPR Article 9 will do little to mitigate bias, I argue that the policymakersâ role shouldnât be to tell developers how to minimize bias but rather do its part as a regulator to strictly hold them accountable through audits.
Here is a sample auditing framework that draws heavily from the National Institute of Standards and Technologyâs (NIST) identification of three primary categories for AI bias: systemic, computational, and human.
-
Assessment of AI System Objectives
-
Purpose of System
-
Assumptions Regarding Fairness and Bias
- Definitions of Fairness Model Attempted
- Sensitive Factors Accounted For
-
Organizational Norms (e.g. Implicit Bias Training)
-
Diversity of Team
-
-
Data Management and Analysis
-
Data Collection Oversight
- Representation of Groups in Data
- Context of Data
-
Proxy Identification
-
-
Algorithm Development and Model Training
-
Transparent Design
- Documentation of Development with Justifications (Particularly Relevant for Models Used in High-Risk Situations (Courts, Healthcare, etc))
-
Bias Mitigation Techniques Used
-
-
Testing and Evaluation
- Independent Validation
- Continuous Monitoring
- Disclose Bias Audit Findings
- Engage Stakeholders