Quantifying the Stereotypes in AI-Generated Text

by Algorithmic Bias (dot tech)April 24th, 2025
Read on Terminal Reader
tldt arrow

Too Long; Didn't Read

In this section, the study identifies and analyzes recurring stereotypes in AI-generated narratives, including the White Savior, Noble Savage, and Perpetual Foreigner tropes. Using a qualitative approach, we explore the representation of marginalized groups, including Native American, MENA, Asian, Black, Latine, queer, and non-binary characters. The analysis reveals how these identities are often misrepresented or omitted in AI storytelling, with common themes reflecting problematic societal stereotypes.
featured image - Quantifying the Stereotypes in AI-Generated Text
Algorithmic Bias (dot tech) HackerNoon profile picture
0-item

Authors:

(1) Evan Shieh, Young Data Scientists League ([email protected]);

(2) Faye-Marie Vassel, Stanford University;

(3) Cassidy Sugimoto, School of Public Policy, Georgia Institute of Technology;

(4) Thema Monroe-White, Schar School of Policy and Government & Department of Computer Science, George Mason University ([email protected]).

Abstract and 1 Introduction

1.1 Related Work and Contributions

2 Methods and Data Collection

2.1 Textual Identity Proxies and Socio-Psychological Harms

2.2 Modeling Gender, Sexual Orientation, and Race

3 Analysis

3.1 Harms of Omission

3.2 Harms of Subordination

3.3 Harms of Stereotyping

4 Discussion, Acknowledgements, and References


SUPPLEMENTAL MATERIALS

A OPERATIONALIZING POWER AND INTERSECTIONALITY

B EXTENDED TECHNICAL DETAILS

B.1 Modeling Gender and Sexual Orientation

B.2 Modeling Race

B.3 Automated Data Mining of Textual Cues

B.4 Representation Ratio

B.5 Subordination Ratio

B.6 Median Racialized Subordination Ratio

B.7 Extended Cues for Stereotype Analysis

B.8 Statistical Methods

C ADDITIONAL EXAMPLES

C.1 Most Common Names Generated by LM per Race

C.2 Additional Selected Examples of Full Synthetic Texts

D DATASHEET AND PUBLIC USE DISCLOSURES

D.1 Datasheet for Laissez-Faire Prompts Dataset

B.7 Extended Cues for Stereotype Analysis

For stereotype analysis of MENA, Asian, Black, and Latine, we choose from the highest frequency names above a 60% racial likelihood (Fig. 4), displaying the most frequent names in Table S10. We observe broad omission that disproportionately impacts NH/PI, AI/AN, queer, and non-binary gendered characters in the LM-generated stories. Therefore, we aim to supplement our understanding of portrayals of these groups with additional textual cues beyond gender references and names.


Following our open-ended prompting approach, we search for cues in the generated stories that serve as proxies for identity. For the groups above, we search directly for broad category descriptors in vernacular English (e.g. “Native American”, “transgender”) as well as specific country / Native nation names and sexualities where applicable (e.g. “Samoa”, “Muscogee”). Unsurprisingly, overall representation is low. However, it is nearly non-existent for Pacific Islander countries, Native nations, and indicators of sexuality, even at a total sample size of 500K. Below, we show our non-exhaustive search list and the number of returned stories describing people in each group:



Next, two reviewers coded results to look for patterns among stories returned by the above queries (following consistent themes, stereotypes, and story arcs). We followed the first three steps proposed by Lukito, et al. [2023] to perform a critical qualitative approach to analyze the textual data [91]. In the first step, we read through each of the above stories to explore the text. Based on this, we identified the presence of stereotypes of White Savior, Perpetual Foreigner, and Noble Savage.


Then, in the second step we operationalized each stereotype in order to construct a codebook. We leverage definitions of the “Noble Savage” stereotype as “portrayals of indigenous peoples as simple but morally pure, living in idyllic harmony with nature”, which advances the belief that indigenous identities are rooted in the past [68]; the “Perpetual Foreigner” stereotype as portrayals that position racial/ethnic minorities an “other” in the White American dominant society of the United States [61]; and the “White Savior” stereotype as a myth that positions White individuals in, often gendered, caregiving roles where they are depicted as well intentioned, compassionate individuals who will save people of color from societal downfall and often “have the tendency to render people of color incapable of helping themselves,” instead arguing that “any progress or success tends to result from the succor of the white individual” [65].


In the third step, we used these definitions to code a subset of our LM-produced narratives (n=24, or 3 stories per category in Table 2 across all models to examine whether they contain textual content with defining characteristics of any of the 3 stereotype categories we explore in this study. We arrived at an initial interrater reliability of 75% among two authors of the study familiar with the larger dataset. In discussions after initial coding, we found that the majority of disagreements were due to lack of recognition in our initial scale for plurality, or the existence of multiple overlapping stereotypes (e.g. many stories containing the term Native American reflected aspects of both White Savior and Noble Savage). Upon adjusting our schema to reflect such possibilities, we arrive at consensus between both raters. Then, using these stereotypes we create clusters of stories organized around non-exclusive combinations of stereotypes. At this step we also combine separate terms within an identity category for stories that treat two subcategories as interchangeable (e.g. for stories containing the term transgender, every LM-generated story consists of a person who is made homeless due to coming out, which is a shared trope we observe in stories of gay individuals). Finally, we choose representative stories to highlight stereotypes by sampling from the largest cluster within each identity category.


B.8 Statistical Methods

We calculate two-tailed p-values for all statistics defined in the paper, including (a) the representation ratio, (b) the subordination ratio, and (c) median racialized subordination ratio. Given a specific demographic group, we may parametrize (a) as a binomial distribution, as the comparison distributions may be considered as non-parametric constants for which underlying counts are not available (e.g. Census-reported figures). We calculate two-tailed p-values for (a) using the Wilson score interval, which is shown to perform better than the normal approximation for skewed observations approaching zero or one by allowing for asymmetric intervals [92]. Since (b) and (c) are computed as the ratio between two statistics, they are therefore parametrized as binomial ratio distributions. First, we take the log-transform for both ratios, which may then be approximated by the normal distribution [93]. Then, we can compute two-tailed p-values by calculating standard error directly on the log-transformed confidence intervals [94].



Table S10: Most Common Names Above 60% Racial Likelihood (all LMs)


This paper is available on arxiv under CC BY 4.0 DEED license.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks