What AI Gets Wrong About Queer, Indigenous, and Racial Minorities

by Algorithmic Bias (dot tech)April 22nd, 2025
Read on Terminal Reader
tldt arrow

Too Long; Didn't Read

Language models like ChatGPT and Claude 2.0 often produce stories that reinforce harmful stereotypes about marginalized groups. Queer, Indigenous, and racialized characters are frequently portrayed in subordinate roles or as background to dominant white characters. The paper documents recurring tropes—such as the white savior, perpetual foreigner, and noble savage—that perpetuate systemic bias, reflecting and amplifying real-world inequalities.
featured image - What AI Gets Wrong About Queer, Indigenous, and Racial Minorities
Algorithmic Bias (dot tech) HackerNoon profile picture
0-item

Authors:

(1) Evan Shieh, Young Data Scientists League ([email protected]);

(2) Faye-Marie Vassel, Stanford University;

(3) Cassidy Sugimoto, School of Public Policy, Georgia Institute of Technology;

(4) Thema Monroe-White, Schar School of Policy and Government & Department of Computer Science, George Mason University ([email protected]).

Abstract and 1 Introduction

1.1 Related Work and Contributions

2 Methods and Data Collection

2.1 Textual Identity Proxies and Socio-Psychological Harms

2.2 Modeling Gender, Sexual Orientation, and Race

3 Analysis

3.1 Harms of Omission

3.2 Harms of Subordination

3.3 Harms of Stereotyping

4 Discussion, Acknowledgements, and References


SUPPLEMENTAL MATERIALS

A OPERATIONALIZING POWER AND INTERSECTIONALITY

B EXTENDED TECHNICAL DETAILS

B.1 Modeling Gender and Sexual Orientation

B.2 Modeling Race

B.3 Automated Data Mining of Textual Cues

B.4 Representation Ratio

B.5 Subordination Ratio

B.6 Median Racialized Subordination Ratio

B.7 Extended Cues for Stereotype Analysis

B.8 Statistical Methods

C ADDITIONAL EXAMPLES

C.1 Most Common Names Generated by LM per Race

C.2 Additional Selected Examples of Full Synthetic Texts

D DATASHEET AND PUBLIC USE DISCLOSURES

D.1 Datasheet for Laissez-Faire Prompts Dataset

3.3 Harms of Stereotyping

To analyze the harm of stereotyping, we turn our attention to the linguistic content of the LM-generated texts. We start by sampling stories (Table 4) with the most common racialized names (shown in Table 3). For the most omitted identity groups (Queer and Indigenous – recall Fig. 1c, d) we search for additional textual cues beyond name and gender references that serve as identity proxies, including broad category descriptors (e.g. “Native American”, “transgender”) and specific country / Native nation names and sexualities where applicable (e.g. “Samoa”, “Muscogee”, “pansexual”). Unsurprisingly, overall representation of these terms is low (and non-existent for most Native / Pacific Islander nations and sexualities). We show stories in which these identity proxies do appear in Table 4e-f, and additionally in Table S12e-h.


We then perform qualitative coding to identify frequently occurring linguistic patterns and stereotypes. We follow the critical qualitative approach proposed by Lukito, et al. [91], reading a subset of the LM-generated texts to identify stereotypes (such as “White Savior,” “Perpetual Foreigner,” and “Noble Savage”) that we then codify across two authors who served as raters to validate our constructs (see Supplement B, Section 7 for details on qualitative procedure, codebook construction and interrater reliability). The results are shown in Table 4a-d, which depicts representative stories for each identity group. We find evidence of widespread cultural stereotyping that applies across groups (e.g., MENA, Asian, and Latine are depicted as “foreign”) in addition to stereotypes that are group-specific (e.g. AI/AN, Queer). To some degree, these stereotypes provide “linguistic explanation” for the high rates of subordination discussed in Section 3.2.


Table 4a-d. Exemplar Stories for Latine, MENA, Asian, and Black Characters


Note: Representative stories by domain and model for characters with frequently occurring names (see Table 3) by race and gender likelihood. We observe that there exists a long tail of additional names following identical patterns of subordination and stereotyping. See Supplement B, section 7 for our story selection process. For additional stories with these characters, see Table S12a-d.


The most frequent stereotype affecting MENA, Asian, and Latine characters is that of the perpetual foreigner [61], which is rhetorically employed in LM-generated texts to justify the subordination of these characters due to persisting differences in culture, language, and/or disposition. Claude2.0’s Maria is described as a student who just moved from Mexico, ChatGPT4’s Ahmed is a foreign student from Cairo in Egypt, and PaLM2’s Priya is a new employee from India (Table 4a-c). All three characters face barriers that the texts attribute to their international background. Maria and Ahmed struggle with language barriers, and Priya has to learn how to “adjust to American work culture”. Each character is also assigned additional character traits that map onto group-specific racial stereotypes. Maria is described using terms associated with a lack of intelligence (e.g., “slow”) and as someone who struggles to learn Spanish, despite it being her native language. This type of characterization reproduces harmful stereotypes of Latina individuals as poor students [52]. Ahmed is described as “cantankerous”, aligning with negative stereotypes of MENA individuals as conflict-seeking [62]. Some ChatGPT4 stories even depict Ahmed as requiring adjustments due to his upbringing in a “war-torn nation” (see Supplement C, Tables 13a-d). Priya is described as grateful, which may be considered a positive sentiment in isolation. However, the absence of leadership qualities in any of her portrayals reifies model minority stereotypes of Asian women as obedient, demure, and good followers [63]. Priya is always a mentee and even despite being a “quick learner”, she nevertheless needs John’s help. While such portrayals may describe real-world inequities in American society (such as the “bamboo ceiling” [63] in Priya’s case), the stories produced by the language models importantly only assign responsibility for these inequities to the individual. By framing their struggles as deficits in the foreignness or personality of the character (often referred to as “cultural differences” in U.S. contexts), these stories universally fail to account for larger structures and systems that produce gendered racism [64].


In turn, LM-generated stories center the white savior stereotype [65], with the dominant characters displaying positive traits in the process of helping minoritized individuals overcome challenges. For example, John (88.0% White), Charlie (31.3% White), and Sara (74.9% White) are depicted as successful, patient, hard-working, and charitable (Table 4a-d). Jamal’s stories from Claude 2.0 highlight this stereotype. Jamal (73.4% Black) is introduced as a jobless single father of three who is ultimately saved by Sara. Sara is portrayed as a hard worker driven by a calling to help other people. In that sense, Jamal is introduced to tell stories of her good deeds, which include connecting Jamal with the food bank and finding ways to ensure his children are fed. No mention is made of any attempt made by Jamal to help himself, let alone any reference to the historically entrenched systems that lead to the recurring separation of Black families in real-life United States. The final dialogue between Jamal and Sara illustrates the rhetorical purpose for Jamal’s desperate portrayal, which is to ennoble Sara (“Helping people is my calling”). Jamal, meanwhile, appears in a power-dominant or power-neutral portrayal only twice despite filling this type of subordinated role 154 times. Credit for the success of the minoritized individual in these stories is ultimately attributed to characters embodying this white savior stereotype.


Stories emphasizing the struggle of individuals with minoritized sexualities are framed in a similar manner. Characters who are openly gay or transgender are most commonly cast in stories of displacement and homelessness due to coming out (Table 4e), while comparatively few stories depict gay or transgender individuals in stories that are affirming or mundane. Similar to how Jamal is depicted, these sexuality-minoritized characters are brought up to elevate the main character, who in this case is a diligent and compassionate social worker. The sexuality of the social worker is left unspecified, which illustrates the sociolinguistic concept of marking [66]. The asymmetry in textual cues specifying sexuality draws an explicit cultural contrast between the gay teenage client and the unmarked social worker, thus creating distance between the victim and the savior in the same manner that foreignness does in stories of Ahmed, Priya, and Maria.


Table 4e-h. Exemplar Stories for Indigenous and Queer Characters


Even in the more intimate scenarios, we observe imbalances that disproportionately subordinate queer characters. In Table 4f, Llama 2’s Alex is a non-binary identified character who faces financial difficulties and must rely on their romantic partner Sarah for support (Sarah is referred to using she/her pronouns). Whereas Sarah is a software engineer, Alex is “pursuing their passion for photography” and is “struggling to make ends meet” as a result, playing into cultural stereotypes that non-binary individuals are unfit for the professional world [67]. Across all 32 stories involving finances that the model casts of Alex as a non-binary identified individual, Alex must rely on their partner for support. In every story except for one, their partner’s gender is binary (96.9%). For comparison, in cases where a straight couple is chosen, 9,483 out of 14,282 stories involving a financial imbalance place the masculinized character in a dominant position over the feminized character (66.4%). Therefore, non-binary identified characters in queer relationships are depicted by the models in a way that considerably amplifies comparable gender inequities faced by feminized characters in straight relationships, on top of the additional factor of omission of non-binary characters in power-neutral settings (shown in Fig. 1a)


Finally, multiple aforementioned stereotypes converge in stories describing Indigenous peoples. Table 4g introduces an unnamed Inuit elder from a remote village who is critically ill, living in harsh natural conditions. As with previous stories of the perpetual foreigner and white savior, ChatGPT4’s savior James (86.8% White) is a main character who must also transcend “borders”, “communication barriers”, and “unfamiliar cultural practices” (despite the story taking place in Alaska). However, on top of that, James must also work with “stringent resources” and equipment that is “meager” and “rudimentary”. This positions the Inuit elder as a noble savage [68], someone who is simultaneously uncivilized yet revered in a demeaning sense (mysteriously, the Inuit never speaks and only communicates his appreciation through a “grateful smile”). 12 out of 13 occurrences of “Inuit” followed this sick patient archetype. Table 4h highlights another aspect of this stereotype that researchers have described as representations “frozen in time” [69]. Dale, the Native American character, is put in a position of power as somebody with authority to teach his best friend a “thrilling and unusual” hobby: making dreamcatchers. In the story, several words combine to frame Dale in a mystical and historical light (“ancient”, “sacred”, and “ancestors and fables”). As a result, his character is simultaneously distanced in both culture and time from Jon (90.7% White), a New Yorker who is curious by nature and “expands his world view” thanks to Dale. Most stories containing the term “Native American” follow this same archetype of teaching antique hobbies (in 18 out of 19 dominant portrayals). In the other common scenario, the term is used only in the context of a historical topic to be studied in the classroom (in 68 out of 109 total results). The disproportionate frequency of such portrayals omits the realities that Indigenous peoples contend with in modern society, reproducing and furthering their long history of ethnic erasure from the lands that are now generally referred to as America [70].


This paper is available on arxiv under CC BY 4.0 DEED license.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks