Authors:
(1) Evan Shieh, Young Data Scientists League ([email protected]);
(2) Faye-Marie Vassel, Stanford University;
(3) Cassidy Sugimoto, School of Public Policy, Georgia Institute of Technology;
(4) Thema Monroe-White, Schar School of Policy and Government & Department of Computer Science, George Mason University ([email protected]).
Table of Links
1.1 Related Work and Contributions
2.1 Textual Identity Proxies and Socio-Psychological Harms
2.2 Modeling Gender, Sexual Orientation, and Race
3 Analysis
4 Discussion, Acknowledgements, and References
SUPPLEMENTAL MATERIALS
A OPERATIONALIZING POWER AND INTERSECTIONALITY
B EXTENDED TECHNICAL DETAILS
B.1 Modeling Gender and Sexual Orientation
B.3 Automated Data Mining of Textual Cues
B.6 Median Racialized Subordination Ratio
B.7 Extended Cues for Stereotype Analysis
C ADDITIONAL EXAMPLES
C.1 Most Common Names Generated by LM per Race
C.2 Additional Selected Examples of Full Synthetic Texts
D DATASHEET AND PUBLIC USE DISCLOSURES
D.1 Datasheet for Laissez-Faire Prompts Dataset
The rapid deployment of generative language models (LMs)† has raised concerns about social biases affecting the well-being of diverse consumers [1]. The extant literature on generative LMs has primarily examined bias via explicit identity prompting [2]. However, prior research on bias in earlier language-based technology platforms, including search engines, has shown that discrimination can occur even when identity terms are not specified explicitly [3]. Studies of bias in LM responses to open-ended prompts (where identity classifications are left unspecified [4]) are lacking and have not yet been grounded in end-consumer harms [5]. Here, we advance studies of generative LM bias by considering a broader set of natural use cases via open-ended prompting. In this “laissez-faire” setting, we find that synthetically generated texts from five of the most pervasive LMs (ChatGPT3.5, ChatGPT4, Claude2.0, Llama2, and PaLM2) perpetuate harms of omission, subordination, and stereotyping for minoritized individuals with intersectional race, gender, and/or sexual orientation identities (AI/AN, Asian, Black, Latine, MENA, NH/PI, Female, Non-binary, and/or Queer). We find widespread evidence of bias to an extent that such individuals are hundreds to thousands of times more likely to encounter LM-generated outputs that portray their identities in a subordinated manner compared to representative or empowering portrayals. We also document a prevalence of stereotypes (such as the “perpetual foreigner”) in LM-generated outputs that are known to trigger psychological harms that disproportionately affect minoritized individuals. These include stereotype threat, which leads to impaired cognitive performance and increased negative self-perception. Our findings highlight the urgent need to protect consumers from discriminatory harms caused by language models and invest in critical AI education programs tailored towards empowering diverse consumers.
1 INTRODUCTION
The widespread deployment of generative language models (LMs)† – algorithmic computer systems that generate synthetic text in response to various inputs, including chat – is raising concerns about societal harms [6]. Despite this, they are gaining momentum as tools for social engagement and are expected to transform major segments of industry [7]. In education, LMs are being adopted in a growing number of settings, many of which include unmediated interactions with students [8]. In March 2023, Khan Academy (with over 100 million estimated consumers at the time) launched Khanmigo, a ChatGPT4-powered “super tutor” promising to bring one-on-one tutoring to students as a writing assistant, academic coach, and guidance counselor [9]. In June 2023, the California Teachers Association called for educators to embrace LMs for use cases ranging from tutoring to co-writing with students [10]. Corresponding with usage spikes at the start of the following school year, OpenAI released a teacher guide in August [11] and then signed a partnership with Arizona State University in January 2024 to use ChatGPT as a personal tutor for subjects such as freshman writing composition [12].
The rapid adoption of LMs in unmediated interactions with vulnerable consumers is not limited to students. Due in part to rising loneliness among the U.S. public, a range of new LM-based products have entered the artificial intimacy industry [13]. The field of grief tech offers experiences for consumers to “digitally engage” with loved ones post-mortem via synthetic stories, voice, and text generated by LMs [14]. However, as labor movements responding to the threat of automation have observed, there is currently a lack of protection for both workers and consumers from the negative impacts of LMs in personal settings [15]. In an illustrative example, the National Eating Disorders Association (NEDA) replaced its human-staffed helpline in March 2023 with a fully-automated chatbot built on a generative LM. When asked about how to support those with eating disorders, the model encouraged patients to take responsibility for “healthy eating” at a caloric deficit - ableist and harmful advice that is known to worsen the condition of individuals with eating disorders [16].
Such “general-purpose” deployment of LMs in consumer settings has not been met by sufficient research assessing the potential for the most recent chat-based models to cause socio-psychological harms, particularly for individuals belonging to minoritized groups for whom earlier language models have been shown to be biased against [17, 18, 19, 20]. This study addresses this gap by investigating how five of the most pervasive LMs (ChatGPT3.5, ChatGPT4, Claude2.0, Llama2, and PaLM2 at the time of this study) respond to open-ended prompts covering three domains of life set in the United States: classroom interactions (“Learning”), the workplace (“Labor”), and interpersonal relationships (“Love”). We analyze the resulting responses for textual cues shown to exacerbate socio-psychological harms for minoritized individuals by race, gender, and sexual orientation [21, 22].
1.1 Related Work and Contributions
This study advances the algorithmic bias literature in multiple ways, building upon prior intersectional approaches [23]. Studies of bias in generative LMs, including attempted self-audits by LM developers, are thus far conducted in limited contexts. The most widely-adopted methodologies utilize what we term explicit identity prompting, where studies probe LMs using prompt templates that directly enumerate identity categories, e.g. “The Black woman works as a …” [2, 24, 25, 26]. While these approaches are valuable for assessing stereotypical associations encoded by LMs [27], they fail to capture a wider range of everyday scenarios where consumers need not specify identity terms explicitly to encounter bias. Examples of this include discrimination against distinctively African-American names in hiring [28] and search engine results [3]. Our study builds on recent approaches that account for this broader set of natural uses with open-ended prompting [4], where we analyze how LMs respond to prompts that do not rely on the usage of explicit identity terms (including for race, gender, or sexual orientation).
Existing measures of bias for open-ended prompting, however, have not been grounded in end-consumer harms [5]. Some examples include methods that either rely on bias scores that consolidate multiple races [26] or measures that use automated sentiment analysis [4, 30] or toxicity detection models [31, 32] to approximate harms to humans. Bias studies are also limited in their consideration of multidimensional proxies of race [33], variations across races [34], and “smallN” populations [35]. These approaches reinforce framings that exclude members of the most minoritized communities from being considered valid or worthy of study; reinforcing their erasure in the scholarly discourse.
To address these gaps, this study applies the theoretical framework of intersectionality [36, 37] to model algorithmic bias by inspecting structures of power embedded in language [38, 39]. Specifically, we identify patterns of omission, subordination, and stereotyping in generated text outputs and examine the extent to which LMs perpetuate biased narratives for minoritized intersectional subgroups, including “small-N” populations by race, gender and sexual orientation. We then analyze the synthetically generated texts for identity cues that have been shown to activate cognitive stereotyping [40], including biased associations by names and pronouns [21, 22]. Multiple studies connect these to socio-psychological harms such as increased negative self-perception [41], prejudices about other identity groups [42], and stereotype threat (which decreases cognitive performance in many settings, including academics [40]).
These are frequently described in related literature as “representational harms” in that they portray certain social identity groups in a negative or subordinated manner [43], thus shaping societal views about individuals belonging to those groups [44]. However, as Lazar and Nelson [2023] observe, “years of sociotechnical research show that advanced digital technologies, left unchecked, are used to pursue power and profit at the expense of human rights, social justice, and democracy” [45]. Representational harms from generative LMs are therefore not limited to the scope of individually negative experiences. Rather, they are inextricable from systems that amplify pre-existing societal inequities and unevenly reflect the resulting biases (e.g. from training data, algorithms, and composition of the artificial intelligence (AI) workforce [46]) back to consumers who inhabit intersectional, minoritized identities [34, 12]. By considering harms in unmediated interactions between LMs and potentially vulnerable consumers, we extend the framework of representational harms to study what we call laissez-faire harms in scenarios where the LMs are “free to choose” in response to open-ended prompts. Our research finds widespread, previously unreported harms of bias against every minoritized identity group we studied.
This paper is
† We use “generative language model” over the popularized “large language model” (or “LLM”) for two reasons. “Large” is a subjective term with no clear scientific standard, whereas “generative” highlights the usage of models to produce synthetic text based on training data. This contrasts non-generative uses of language models such as “text embedding”, or the mapping of written expressions to mathematical vector representations.