This paper investigates the presence of covert racism in language models, focusing on dialect prejudice against African American English (AAE). The authors demonstrate that language models exhibit more negative associations with AAE than any experimentally recorded human stereotypes about African Americans, despite showing more positive overt stereotypes about the group. This dialect prejudice leads to harmful consequences, such as assigning less prestigious jobs and increased likelihood of suggesting death penalty for AAE speakers in hypothetical scenarios. The study reveals that current bias mitigation techniques, like human feedback alignment, exacerbate this discrepancy by masking the underlying covert bias. The findings highlight the significant implications for the fair and safe use of language technology.
Publisher
Nature
Published On
Sep 05, 2024
Authors
Valentin Hofmann, Pratyusha Ria Kalluri, Dan Jurafsky, Sharese King
Tags
covert racism
language models
dialect prejudice
African American English
bias mitigation
Related Publications
Explore these studies to deepen your understanding of the subject.