logo
ResearchBunny Logo
Hidden musicality in Chinese Xiangsheng: a response to the call for interdisciplinary research in studying speech and song

The Arts

Hidden musicality in Chinese Xiangsheng: a response to the call for interdisciplinary research in studying speech and song

F. R. S. Lawson

Discover how music cognition research can embrace interdisciplinarity through a fascinating exploration of Chinese Xiangsheng, a unique form of musical comedy. Conducted by Francesca R. Sborgi Lawson, this study reveals the intricate relationship between speech and song through innovative methodological approaches using a compelling case study performance.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses how musicality—the foundational capacity for temporal, pitch-structured, and affectively coordinated communication—links speech and song, and how this relationship manifests in Chinese Xiangsheng. Motivated by a 2018 call in music cognition for cross-cultural and interdisciplinary work, the study situates Xiangsheng outside Western traditions to broaden perspectives on language–music relations. The research question asks whether and how musical features (rhythm/pulse and pitch) associated with communicative alignment and engagement emerge in Xiangsheng interactions between performers and with audiences, and how these features inform presentational vs participatory dynamics in performance. The purpose is to integrate empirical tools and ethnomusicological perspectives to illuminate intersubjective communication in performance, advancing dialogue across disciplines.
Literature Review
The paper surveys work on the musical aspects of expressive behavior and intersubjective communication, including conversation analysis and synchronized bodily/linguistic rhythms (Allport & Vernon; Goffman; Ambady; Schegloff; Sidnell). It highlights research on mother–infant interaction, introducing communicative musicality (Malloch & Trevarthen), characterized by rhythmic, melodic, and kinetic patterns supporting social bonding, with ethological grounding in human evolution (Dissanayake). It distinguishes musicality (a biological capacity/predisposition) from culturally specific music (Honing), suggesting commonalities between musicality in speech and song. Ethnomusicological studies of performer–audience reciprocity and performative mutuality (Turino; Berliner; Becker; Small; Frith; Hesmondhalgh & Negus; Tsioulakis & Hytonen-Ng) demonstrate a continuum between presentational and participatory modes, with audience engagement ranging from overt contribution to silent absorption (Herbert; Leante; Davitt/Heaney). The Cambridge Study (Hawkins, Cross, Ogden, Robledo) found that highly engaged, attitudinally aligned conversations show periodic pulse alignment and pitch-interval relations across turns, supporting the idea that speech and song share communicative substrates. These strands frame Xiangsheng as a fertile site to examine musicality across speech and song with active audience participation.
Methodology
The study adapts methods from the Cambridge Study to a contemporary Xiangsheng performance, "Xiaoao Jianghu" (A Carefree Life) by Guo Degang and Yu Qian (YouTube; originally published Feb 15, 2016). Xiangsheng’s structure includes speaking (shuo), mimicry (xue), singing (chang), and provoking laughter (dou), and styles range from "heavy on one end" (dominant main actor with straight man) to "two sides of a snap" (more equal roles). Data processing employed ELAN for multimodal video annotation, Praat for acoustic analysis (pitch in Hz), and R for statistical modeling (Lawson et al., 2020). Two segments ("bouts") were defined: Bout 1 (1:00–2:14) involved spoken comedy without singing; Bout 2 (12:50–14:57) encompassed the climactic sung exchange. For rhythmicity, analysts annotated pulses and turn onsets to examine whether the first syllable of a response aligned with a pulse established by the prior speaker, following the Cambridge approach. For pitch, each turn transition was represented as an ordered pair: (pitch of last syllable of antecedent utterance, starting pitch of subsequent response), where agents could be Guo, Yu, or the audience (cheering/applause treated as a unified audio agent when present). Statistical analysis computed linear correlation coefficients between turn-pair pitches and fit linear models predicting the response starting pitch (dependent variable) from the antecedent last-syllable pitch, an indicator for antecedent agent (audience vs actor), and their interaction. The study also qualitatively tracked shifts between presentational and participatory modes and the role of audience responses as a third performing agent.
Key Findings
- Bout 1 (spoken, 1:00–2:14): The interaction exemplified the "heavy-on-one-end" (presentational) style with Guo dominant and Yu as foil. Audience laughter increased mid-bout, with response durations eventually exceeding Yu’s utterances. Rhythmicity was present in portions but not throughout; aggregated analysis across Bout 1 showed no statistically significant pitch matching across turns. - Across both bouts: Rhythmicity (regular rhythmic cycles across turns) was observed for approximately 50% of the analyzed material. - Bout 2 (sung climactic exchange, 12:50–14:57): Interaction shifted toward "two-sides-of-a-snap" (more participatory) with stronger audience involvement. Quantitatively, there was a highly significant linear correlation between the antecedent last-syllable pitch and the response starting pitch across turn transitions, in both spoken and sung phrases, indicating reciprocal pitch approximation among Guo, Yu, and the audience. Audience cheering/applause formed extended, rhythmic, and pitched responses at climactic moments. - The performance exhibited a gradual transition from presentational toward participatory dynamics as emotional engagement increased, culminating in strong audience participation at the climax.
Discussion
Findings support the hypothesis that heightened affiliative engagement (attitudinal alignment) in performance fosters musicality in speech and song through rhythmic entrainment and pitch approximation. In Xiangsheng, as semantic content dominates early presentational segments, rhythmic alignment is intermittent; yet as emotional stakes rise and singing is introduced, performer–performer and performer–audience interactions exhibit stronger pitch coupling and extended audience responses, aligning with communicative musicality. This demonstrates a fluid continuum between presentational and participatory modes rather than a strict dichotomy, with audiences functioning as co-performers at climactic points. The results complement the Cambridge findings by extending them to a culturally distinct, scripted-yet-interactive genre with explicit audience participation, reinforcing the view that speech and music are intertwined facets of a human communicative toolkit. The study illustrates how combining ethnomusicological insight with empirical tools can reveal hidden musical structures in spoken performance and clarify mechanisms of performative mutuality.
Conclusion
The paper contributes a cross-cultural, interdisciplinary case study showing that Chinese Xiangsheng embodies a close relationship between speech and song, with measurable musicality emerging in performer and audience interactions. Empirically, rhythmic entrainment occurs for about half of the analyzed performance, and significant pitch approximation across turns appears at the emotionally charged climax, particularly in the sung bout. Conceptually, presentational and participatory modes operate on a continuum that shifts dynamically with engagement, challenging rigid dichotomies. Methodologically, the work demonstrates the value of using ELAN, Praat, and statistical modeling in tandem with ethnomusicological frameworks, answering calls for interdisciplinary research beyond Western musical contexts. Future research should replicate and extend these analyses across additional performances and genres, refine metrics for audience reception, and leverage evolving wearable/physiological technologies to capture pre-applause engagement in ecologically valid settings, fostering scholarly mutuality between empirical and humanistic approaches.
Limitations
- Single-case, preliminary analysis of one recorded Xiangsheng performance limits generalizability. - The scripted, presentational nature of Xiangsheng constrains continuous rhythmicity compared with spontaneous conversations; intelligibility demands may disrupt periodicity. - Audience reception was analyzed via aggregate audio signals (laughter/applause) without individual-level, real-time physiological measures; silent absorption and heterogeneous responses could not be fully captured. - Differences from the Cambridge paradigm (e.g., scripting, presence of audience) complicate direct comparisons. - Technical constraints: invasive/expensive lab-based measurement tools are not readily deployable in field settings; ecological validity vs experimental control trade-offs remain.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny