logo
ResearchBunny Logo
Why can't Siri sing? Cultural narratives that constrain female singing voices in AI

Computer Science

Why can't Siri sing? Cultural narratives that constrain female singing voices in AI

F. R. S. Lawson

Discover how the singing limitations of female AI voices like Siri, Alexa, and Sophia echo entrenched cultural narratives that both captivate and disturb. This research by Francesca R. Sborgi Lawson examines the misogynistic stereotypes linked to AI vocality, shedding light on the urgent need to address these biases for a more equitable future in artificial intelligence.

00:00
00:00
Playback language: English
Introduction
The paper begins by noting the significant advancements in AI speech technology, particularly the pursuit of emotionally nuanced and expressive voices. While spoken AI voices are relatively common, particularly female-gendered ones, singing AI voices are less prevalent. The author posits that this discrepancy may stem from unexamined cultural stereotypes surrounding female singing voices. Drawing on James W. Carey's concept of technology as a cultural performance and Safiya Noble's work on algorithmic bias, the paper explores how these hidden narratives manifest in the creation and reception of AI singing voices, particularly those of Siri, Alexa, and Sophia. It highlights the long-standing ambivalence towards women's singing voices, rooted in historical representations from sirens to opera divas, and suggests that this ambivalence continues to shape the development and user interaction with female-voiced AI.
Literature Review
The paper reviews existing literature on the cultural history of female voices, including their portrayal in mythology (sirens), Elizabethan literature and theatre, opera (particularly the rise and fall of the castrati and the portrayal of female characters), and the recording industry. It examines how female singers have been perceived as simultaneously alluring and dangerous, leading to their suppression or control. The author uses this historical context to illuminate the ongoing debate about gender representation in AI and the hidden misogyny behind the technological choices that influence how virtual assistants are designed and perceived. Kate Manne’s concept of misogyny as an enforcement strategy is introduced to understand the mechanisms employed to regulate female agency, both in the historical context and in the realm of AI.
Methodology
This study is not based on empirical field research but rather on a qualitative analysis of examples and existing data. The paper analyzes historical representations of female singing voices across various cultural contexts to highlight the persistence of ambivalent narratives. It further analyzes YouTube videos featuring Siri, Alexa, and Sophia singing, focusing on the comments and responses from viewers as evidence of the cultural narratives at play. By examining these responses, the author aims to demonstrate how ridicule and other forms of mockery are used to control and contain the perceived power of female AI voices, echoing patterns seen throughout history.
Key Findings
The analysis of YouTube comments on AI singing performances reveals several key findings. First, there is a prevalent use of ridicule and mockery directed toward the singing abilities of Siri, Alexa, and Sophia. This serves as a form of preemptive misogyny, controlling acceptable boundaries for female-voiced AI and reinforcing cultural expectations of female subservience. Second, the programming of these virtual assistants as servile entities reinforces existing gender stereotypes, contributing to a narrative of female subordination. Third, the examples of Siri's singing attempts being temporarily deactivated and the catfight-style rap battle between Siri and Alexa demonstrate the ways in which the technology is used to manage and ultimately prevent the expression of powerful, potentially unsettling female voices. Finally, the responses to Sophia's duet with Jimmy Fallon highlight the co-existence of fascination and fear, with listeners reporting both chills of pleasure and anxiety, underscoring the complex and ambivalent nature of the cultural narrative. The study highlights the inherent tension between the desirable qualities often associated with the female singing voice and the ingrained fear of its potential power.
Discussion
The findings demonstrate that technological advancements in AI are not value-neutral but are deeply embedded in existing cultural narratives. The design and user interaction with female-gendered AI voices perpetuate and reinforce misogynistic stereotypes. The paper's analysis illuminates the ways in which these narratives are played out through the programming of AI personalities, user responses on social media, and the technology’s intended and unintended consequences. This reinforces the need for a critical engagement with the cultural implications of AI development, moving beyond purely technological considerations to include a thorough examination of underlying social and cultural biases.
Conclusion
The study concludes that addressing the cultural narratives surrounding female singing voices is crucial for the ethical and responsible development of AI. Future research in AI voice technology should move beyond simply improving algorithms and consider the deeper cultural context that shapes our interactions with technology. By understanding and addressing these embedded biases, we can strive to create AI systems that are both technologically advanced and culturally sensitive, promoting more equitable and just representations of gender.
Limitations
The paper acknowledges that its analysis relies on a limited sample of YouTube videos and comments, which may not fully represent the diversity of user responses to AI singing. Furthermore, the study’s focus on English-speaking audiences may not generalize to other cultures with different historical and cultural understandings of female vocality. Finally, the paper acknowledges the complex nature of “chills” responses and acknowledges that further research is needed to understand the neurophysiological basis of varying emotional responses to AI singing.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny