logo
ResearchBunny Logo
Why can't Siri sing? Cultural narratives that constrain female singing voices in AI

Computer Science

Why can't Siri sing? Cultural narratives that constrain female singing voices in AI

F. R. S. Lawson

Discover how the singing limitations of female AI voices like Siri, Alexa, and Sophia echo entrenched cultural narratives that both captivate and disturb. This research by Francesca R. Sborgi Lawson examines the misogynistic stereotypes linked to AI vocality, shedding light on the urgent need to address these biases for a more equitable future in artificial intelligence.... show more
Introduction

The paper examines why spoken female-gendered AI voices are common while sung voices remain rare, proposing that beyond technical hurdles, hidden cultural stereotypes surrounding the female singing voice play a major role. It outlines rapid advances in speech emotion recognition and the greater complexity of modeling singing versus speech. Building on James W. Carey’s view of technology as a cultural blueprint and Safiya Noble’s critique of embedded bias in platforms, the author frames the study as an exploration of how long-standing Western narratives about women’s voices shape contemporary responses to female-sounding AI, especially on social media. The paper argues that the female singing voice, more than speech, amplifies cultural narratives of otherness, allure, and threat, and introduces a preliminary analysis using examples and user comments from YouTube to illuminate these dynamics.

Literature Review

The review traces enduring Western narratives about female vocality: the mythic siren as emblem of seductive auditory power; early modern restrictions on women’s voices in sacred and public spaces leading to substitutions like boy sopranos and, in Italy, the castrato tradition, underscoring the lengths taken to exclude women’s voices. In the 19th century, the operatic diva both empowered and threatened patriarchal norms; despite vocal dominance on stage, libretti often enacted the death or domestication of heroines (per Clément and McClary), with singers’ lives sometimes mirroring tragic roles (Smart). In the age of recording, theorists like Adorno (via Engh) problematized disembodied female recordings, reinforcing the idea of the female voice as marked and in need of control. Contemporary pop culture continues patterns of sexualization and vilification of female singers (Casano), as voiced by artists like Madonna. The review then introduces Kate Manne’s distinction between sexism (ideology) and misogyny (enforcement), arguing that enforcement mechanisms—punishment, containment, ridicule—apply to female vocalists and, by extension, to feminized AI voices. Complementary media studies perspectives address acousmatic and cinematic constructions of the computer’s voice (Faber), user preferences and gendered design of assistants, and the coupling of feminine voicing with servility (Guzman; Rothschild), as well as ridicule as a social control tactic (Waller; Janes and Olson).

Methodology

Qualitative, interpretive cultural analysis of a small corpus of widely viewed YouTube videos (2018–2020) featuring feminized AI singing: (1) Siri performing a segment of Bohemian Rhapsody; (2) a staged Siri vs. Alexa rap battle; (3) Sophia the robot’s televised duet with Jimmy Fallon. The author examines the content and tone of performances and selects representative user comments to surface affective responses (e.g., laughter, mockery, chills, fear) and policing mechanisms such as ridicule. The theoretical framework integrates Carey’s technology-as-cultural-blueprint, Noble’s account of embedded bias, Manne’s misogyny as enforcement, Faber’s analysis of acousmatic computer voices, Guzman’s account of assistant servility, and scholarship on ridicule as social control. The study is explicitly preliminary and not based on field or experimental research; it does not employ formal coding schemes or statistical analysis.

Key Findings
  • The expressive, culturally charged female singing voice triggers intense audience reactions when rendered by AI, including anxiety, pleasure, and chills, revealing persistent narratives that female vocality is both alluring and threatening.
  • Three intertwined enforcement mechanisms are visible in how feminized AI voices are created and received: perpetuation of stereotypes, programming of servility, and audience ridicule that polices boundaries of acceptable female vocal display.
  • Siri’s Bohemian Rhapsody: a monotone, prosody-poor rendering prompted presenters and commenters to mock and belittle the performance; the ephemeral nature of such Easter eggs suggests ambivalence about enabling a compelling female-sounding sung voice in mainstream assistants.
  • Siri vs. Alexa rap battle: frames the assistants in a catfight trope, encouraging spectators to enjoy mutual denigration and humiliation, normalizing misogynistic mockery and division between feminized agents.
  • Sophia’s duet with Jimmy Fallon: elicited mixed affect—musical chills, fascination, and fear—highlighting how a more lifelike female-coded robotic singer intensifies ambivalence; commenters frequently reported chills and unease, linking affect to both aesthetic response and anxiety about agency and power.
  • Overall, user ridicule operates as a pre-emptive enforcement strategy that neutralizes perceived threat, aligning with long historical efforts to contain female singing voices.
Discussion

Findings support the thesis that feminized AI singing voices inherit and amplify entrenched Western narratives of female vocal otherness and power. When AI systems are gendered female, designers often encode servility, which preemptively constrains agency, while audiences deploy ridicule to enforce compliance and minimize perceived threat. The Siri examples show how poor prosody invites disparagement that symbolically reinscribes limits on female vocal display. The Alexa–Siri battle reproduces divisive catfight scripts that undermine collective empowerment. Sophia’s more human-like performance evokes peak-aesthetic responses and fear, echoing siren lore and opera’s ambivalent diva archetype. These dynamics indicate that cultural narratives, not technical hurdles alone, constrain the development and acceptance of female-sounding AI singing. Addressing such narratives is essential if emotionally capable AI vocality is to progress without reproducing misogynistic enforcement.

Conclusion

The study concludes that AI vocal technologies are not culturally neutral: they function as blueprints that can reinforce or reshape long-standing narratives about women’s voices. From antiquity’s sirens to modern pop culture, female singing has been desired yet feared, with recurring strategies of removal, containment, and undoing. On social media, responses to feminized AI singing make these ambivalences starkly visible through ridicule, enforced servility, and polarized affect. Future work in emotion-rich AI singing should critically examine and reconfigure inherited representational blueprints, making explicit, equitable design choices about gender and agency. By becoming conscious of cultural ideals embedded in voice design, creators can avoid replicating insidious stereotypes and open pathways for more diverse, respectful, and compelling AI vocalities.

Limitations
  • Preliminary, qualitative study with a small, purposive sample of YouTube videos and comments; lacks systematic sampling, coding, or quantitative analysis.
  • Focused on Western European and North American cultural narratives; findings may not generalize across cultures.
  • Centers on feminized, largely white, middle-class coded personas; intersectional dimensions (e.g., race, class) are acknowledged but not analyzed.
  • Dependent on platform-visible comments, which may be unrepresentative and subject to moderation biases.
  • Does not engage neurophysiological measurement of chills or affect; mechanisms behind reported chills are speculative and drawn from prior literature.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny