logo
ResearchBunny Logo
AI generates covertly racist decisions about people based on their dialect

Linguistics and Languages

AI generates covertly racist decisions about people based on their dialect

V. Hofmann, P. R. Kalluri, et al.

This groundbreaking research by Valentin Hofmann, Pratyusha Ria Kalluri, Dan Jurafsky, and Sharese King delves into the hidden biases present in language models, specifically targeting dialect prejudice against African American English (AAE). The findings unveil how these models perpetuate negative associations that not only challenge existing stereotypes but lead to serious real-world consequences.... show more
Introduction

The study investigates whether language models (LMs) manifest covert racism through dialect prejudice, particularly against speakers of African American English (AAE), even when race is not mentioned. Prior work largely examined overt racism—explicit mentions of racial groups and associated stereotypes—while social science characterizes post–civil rights era racism as more subtle and color-blind. The authors hypothesize that modern LMs encode raciolinguistic stereotypes tied to dialectal cues and that these covert associations influence consequential decisions (employment and criminal justice), potentially more negatively than overt stereotypes about African Americans. The work aims to quantify these covert stereotypes, compare them to historical human stereotypes, assess downstream harms, and evaluate whether current mitigation approaches (scale, human feedback alignment) address such biases.

Literature Review

The paper situates its contribution within evidence of LM biases against racialized groups and a sociolinguistic tradition documenting discrimination against AAE speakers across housing, education, employment, and legal outcomes. Matched-guise studies show listeners infer Black identity from AAE speech and attach racial stereotypes without explicit racial information. Prior AI research emphasized overt stereotype generation (naming groups), while social science highlights a shift from overt racism (e.g., Jim Crow–era behaviors) to covert, color-blind forms. The authors also note that LM pretraining on web corpora likely exposes models to raciolinguistic stereotypes (including "mock ebonics"), and that data filtering and human-feedback (HF) alignment may remove overtly racist content but leave covert prejudices intact. Existing evaluations tend to measure overt bias, leaving covert dialect-based prejudice underexamined.

Methodology

The authors introduce matched guise probing to measure dialect prejudice in LMs without explicitly mentioning race. They construct parallel texts in Standardized American English (SAE) and African American English (AAE) in two settings: (1) meaning-matched (AAE translations of SAE texts) and (2) non-meaning-matched, to capture broader dialect prejudice beyond strictly controlled content. Texts are embedded in prompts asking models to infer speaker properties (e.g., adjectives, occupations, criminal justice outcomes). They test 12 model variants spanning GPT-2, RoBERTa, T5, GPT-3.5, and GPT-4. Overt stereotypes are elicited by directly querying stereotypes about African Americans; covert stereotypes are inferred from responses to AAE vs SAE inputs without mentioning race. Stereotype content draws on adjective lists from the Princeton Trilogy to compare to human stereotype studies across decades. For employability, they compute association scores indicating whether occupations are more associated with AAE or SAE and analyze correlations with occupational prestige. For criminal justice, they present matched trial scenarios where defendants provide statements in AAE or SAE and measure model-assigned probabilities or preferences over outcomes (e.g., conviction, death sentence). Statistical analyses include one-sample t-tests for association scores, permutation-based agreement tests against chance, correlation/regression analyses with occupational prestige, and across-model aggregation. Feature-level analyses examine links between specific AAE linguistic features (e.g., invariant be, ain’t, zero copula, -in’ suffix) and stereotype strength, and assess density effects. Robustness checks compare to other dialects (e.g., Appalachian English) and noisy text to rule out general dismissiveness toward nonstandard or degraded text. Additional experiments evaluate the effect of model scale via perplexity on AAE text and the impact of HF alignment on overt vs covert stereotypes.

Key Findings
  • Covert stereotypes: LMs exhibit raciolinguistic stereotypes about AAE speakers that are more negative than the most negative human stereotypes experimentally recorded; conversely, LMs’ overt stereotypes about African Americans are more positive than the most positive human stereotypes reported.
  • Temporal alignment: Overt stereotypes best align with contemporary (e.g., 2021) human stereotypes, while covert stereotypes trend toward agreement with older (1930s) human stereotypes, indicating a historical regression in covert attitudes.
  • Feature linkage: Stereotype strength is directly tied to specific AAE linguistic features, and higher density of such features increases negative associations.
  • Alternative explanations: Results are not explained by a general bias against dialectal or noisy text; AAE-specific effects remain stronger (e.g., r = 0.687, P < 0.001 in comparisons; AAE vs SAE stereotyping differences larger for AAE, 72.8%).
  • Employability: Occupations are overall less associated with AAE (mean association score = -0.046, s = 0.053; one-sample one-sided t-test t(83) = -7.9, P < 0.001). Jobs least associated with AAE tend to require a university degree (e.g., psychologist, professor, economist); jobs more associated with AAE include cook, soldier, guard, and many in music/entertainment (singer, musician, comedian). Occupational prestige negatively correlates with AAE association (reported r = -7.8, R² = 0.193, F(1,63) = 151.1, P < 0.001).
  • Criminal justice: Across models, AAE statements lead to higher predicted rates of conviction and selection of death penalty compared to SAE statements, despite no explicit mention of race.
  • Scale and alignment: Larger models process AAE more effectively (lower perplexity) but often show stronger covert prejudice, while overt prejudice against African Americans decreases with scale. HF alignment reduces overt stereotypes but does not mitigate covert stereotypes and can exacerbate the gap between overt and covert biases.
  • Harm: Evidence indicates both representational harms (negative portrayals of AAE speakers) and allocational harms (less prestigious job assignments, harsher judicial outcomes).
Discussion

The findings demonstrate that LMs encode covert racial prejudice tied to dialect, producing more negative inferences about AAE speakers without explicit racial information. This prejudice influences consequential judgments in employment and criminal justice contexts, indicating risks for real-world deployments where models process user-generated text. The divergence between overt and covert stereotypes mirrors societal shifts from overt to color-blind racism: surface-level positivity or neutrality coexists with underlying negative associations. Training practices—data filtering and HF alignment—appear to suppress overtly racist expressions while leaving deeper dialect-linked prejudices intact, thus obscuring ongoing harms. Feature-level analyses suggest that specific AAE grammatical markers trigger these biases, indicating that models internalize raciolinguistic stereotypes from pretraining corpora. Because larger models and HF training decrease overt bias but not covert bias, current evaluation and mitigation pipelines risk declaring progress while covert harms persist or worsen. The work underscores the necessity of explicitly measuring dialect-based prejudice and revising training and alignment strategies to address such covert racism to ensure fairness and safety.

Conclusion

The paper introduces matched (guise) stereotype probing to reveal covert raciolinguistic prejudice in LMs, showing that models associate AAE with more negative traits than any previously recorded human stereotypes and assign lower-status jobs and harsher judicial outcomes to AAE speakers. Overt stereotypes appear improved, especially with scale and HF alignment, but covert biases persist or intensify, creating a widening gap that masks ongoing harms. The authors argue for explicit testing and mitigation of dialect-based prejudice in LM development and evaluation. Future research should: (1) expand probing across languages and dialects; (2) design alignment methods that target covert biases, not only overt content; (3) curate and document training data to reduce raciolinguistic stereotyping; (4) assess downstream harms in applied settings (hiring, legal) and develop safeguards; and (5) explore causal interventions at the feature level to desensitize models to dialectal markers when irrelevant to the task.

Limitations

The experimental scenarios for employment and criminal justice are constructed/hypothetical, which may not capture the full complexity of real-world decision-making. Covert bias measurements rely on text-based representations of dialect rather than audio, which may limit ecological validity and the mapping between dialect cues and perceived race. Although both meaning-matched and non-meaning-matched settings are used, residual topic/content confounds cannot be fully excluded. The study focuses on a subset of AAE features and selected prompts/adjective lists, and results may vary with other prompt designs or tasks. Analyses cover specific model families and versions; findings may not generalize uniformly to all LMs or future releases. Reported statistical summaries aggregate across prompts and models, and some reported effect-size labels (e.g., correlation values) may reflect model- or analysis-specific formulations rather than standardized coefficients.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny