logo
ResearchBunny Logo
Individual differences in attention and working memory modulate the process of tone merging: evidence from Macau Cantonese

Linguistics and Languages

Individual differences in attention and working memory modulate the process of tone merging: evidence from Macau Cantonese

H. Wang, F. Gao, et al.

This study by Han Wang, Fei Gao, and Jingwei Zhang explores the intriguing relationship between cognitive functions like attention and working memory and tone merging in Macau Cantonese. Discover how these cognitive abilities significantly influence perception and production of tones at different merging stages.

00:00
00:00
~3 min • Beginner • English
Introduction
The study investigates how domain-general cognitive functions—especially attention and working memory—modulate tone merging in Macau Cantonese. Cantonese has six lexical tones (T1–T6), with several confusable pairs (T2–T5 rising, T3–T6 level, T4–T6 low-fall vs low-level) known to undergo or have completed mergers. Prior work focused largely on Hong Kong/Guangzhou populations and produced mixed findings on the roles of attention, working memory, and executive function in tone processing. Macau presents a unique window because the three tone pairs are at different merger stages: T2–T5 largely completed among younger speakers; T3–T6 widely merging; T4–T6 at an early, slower stage. The study asks: (1) What is the current status of tone merging in Macau? (2) Are effects of cognitive functions consistent across tone pairs at different merger stages? (3) Are cognitive effects similar for perception and production? To avoid visual-auditory confounds, tasks were conducted exclusively in the auditory modality.
Literature Review
Research on language change highlights both social and cognitive factors. In Cantonese, confusable tone pairs (T2–T5, T3–T6, T4–T6) show perceptual and production confusion. Neurocognitive studies suggest attentional mechanisms impact acoustic cue representation and speech processing (e.g., MMN/P3a findings, attentional switching, working memory). Prior Cantonese studies (Law et al., 2013; Ou et al., 2015; Ou & Law, 2017) implicated attention and working memory but often focused on single pairs (e.g., T2–T5) or mixed criteria across pairs, potentially confounding results. Multisensory research shows modality-specific attentional dynamics, with auditory information harder to filter than visual, motivating an auditory-only design. Theoretical accounts of perception-production links differ: Motor Theory posits strong production-perception coupling, while DIVA posits perceptual templates guiding production. Empirically, perception and production mergers are not always synchronous across Cantonese tone pairs (e.g., T2–T5 vs T4–T6 patterns). These gaps motivate a holistic, modality-controlled investigation across multiple tone pairs at different merger stages.
Methodology
Participants: 44 right-handed native Macau Cantonese speakers (24 females), ages 17–28 (M=20.09, SD=2.38), with Cantonese as the exclusive family language; at least one parent born in Macau; no extended absence from Macau within 3 years; no reported hearing/vocal disorders or mental/neurodegenerative illness. Ethics approval and informed consent obtained; participants compensated. Materials—Production: Four CV roots (/si/, /ji/, /fu/, /se/) combined with six tones yielded 24 test words (4 syllables × 6 tones), selected for single-use/common-use and non-homography based on prior work; 24 additional filler words were interspersed. Three native judges validated items. Materials—Cognitive tasks: Attention assessed with Test of Everyday Attention (TEA) subtests focusing on auditory components: Elevator Counting, Elevator Counting with Distraction, Elevator Counting with Reversal, Telephone Searching, Telephone Searching while Counting, and Lottery (Telephone Searching used only to derive dual-task score and excluded from final analysis due to visual nature). Working memory assessed with WAIS-IV Digit Span Forward, Backward, and Sequencing (16 strings, 2–9 digits each). Executive function assessed with an auditory Stroop test (Stroop-tones: pure tones 220/250 Hz; Stroop-words: Cantonese words "高"/"低" spoken at 220/250 Hz, stimuli 500 ms, 70 dB), adapted into Cantonese; EF score computed as per Kestens et al. (2021). Materials—Perception: Same 24 words as production, paired into AX discrimination stimuli with AA and AB pairings. 32 test pairs (20 AA, 12 AB) involving potentially merging tones, plus 16 filler pairs (4 AA, 12 AB) with non-merging tones to prevent strategy. All stimuli normalized to 500 ms duration and 70 dB. Procedure: In a soundproof lab, production task presented 48 words (24 test + 24 fillers) in randomized order; each word read aloud three times; slides played twice. Recordings via OLYMPUS LS-100 and AKG C-420, 44.1 kHz/24-bit. Cognitive tasks (TEA, Digit Span, auditory Stroop) followed immediately. After two weeks, participants performed the AX perception task (E-Prime 3.0): fixation 300 ms, blank 300 ms, first token, 500 ms ISI, second token; response window up to 3 s (F/J counterbalanced) with 800–1000 ms ITI; each pair repeated 10 times; total 480 trials (4 syllables × 12 pairs × 10 repetitions). Data analysis: Production Fo extracted in Praat; for each of two repetitions per token (first repetition used if three collected), pitch-carrying segments identified, time-normalized into 10 equal parts, 10 Fo points extracted, and converted to semitones (ST) relative to each speaker’s mean Fo. Growth Curve Analysis (GCA) modeled contours with orthogonal polynomials, focusing on mean, slope, and curvature. Merger criteria: for T2–T5 rising pair, both mean and slope must differ to count as distinct; for T3–T6 level and T4–T6 low-level vs low-falling, difference in at least one parameter indicates non-merger. Cognitive task scoring followed manuals; EF RTs beyond ±3 SD excluded before computing EF index. Perception: fillers and AA pairs excluded from analysis of discrimination/RT; trials beyond ±3 SD RT removed; discrimination rate and mean RT computed. Perceptual merger classification used AX discrimination rate <95% as “merged” following Ou & Law (2016).
Key Findings
Group-level production (GCA): Among younger Macau speakers, T2–T5 is fully merged (no differences in mean, slope, curvature; ps>0.05). T3–T6 not fully merged (significant mean difference, p=0.01). T4–T6 not merged (significant differences in mean, slope, curvature; ps<0.05). Individual-level merger counts: Production—T2–T5 merged in 44/44; T3–T6 merged in 22/44; T4–T6 merged in 8/44. Perception (AX <95% criterion)—T2–T5 merged in 44/44; T3–T6 merged in 18/44; T4–T6 merged in 15/44. Validation (Mann–Whitney U): T3–T6 perception: unmerged vs merged discrimination rate M=0.98 (SD=0.02) vs 0.83 (SD=0.13), p<0.001; RT 957.53 ms (SD=204.22) vs 1064.39 ms (SD=186.25), p=0.045. Production ST difference: unmerged M=1.16 ST (SD=0.63) vs merged M=0.54 ST (SD=0.64), p=0.001. T4–T6 perception: unmerged vs merged discrimination M=0.98 (SD=0.02) vs 0.87 (SD=0.07), p<0.001. Production slope difference: unmerged M=0.36 (SD=0.17) vs merged M=0.20 (SD=0.18), p<0.01. Perception–production linkage: For T3–T6, those unmerged in perception show larger production ST differences than merged (M=1.04 ST, SD=0.69 vs 0.58 ST, SD=0.64), t(42)=-2.25, p=0.03. For T4–T6, production slope did not differ by perceptual merger status (p=0.51). For both T3–T6 and T4–T6, perceptual discrimination/RT did not differ by production merger status (ps>0.05). Cognitive function group differences: T3–T6 perception—unmerged showed higher attention (M=52.42, SD=4.97 vs 47.61, SD=7.53) and working memory (M=37.96, SD=5.83 vs 32.67, SD=6.14), ps<0.05; no EF differences. T4–T6 perception—unmerged showed higher attention (M=52.68, SD=4.67 vs 46.53, SD=7.92), p<0.01; higher working memory (M=37.28, SD=5.89 vs 32.93, SD=6.72), t(42)=-2.21, p=0.03; no EF differences (p=0.81). Production: no significant cognitive differences for either pair (ps>0.05). Correlations (Pearson): T2–T5—no significant correlations with attention, working memory, or EF for discrimination or RT (ps>0.05). T3–T6—discrimination correlated with attention r=0.55, p<0.001 and working memory r=0.53, p<0.001; RT correlated negatively with working memory r=-0.33, p=0.03; production average ST difference correlated with attention r=0.28, p=0.03. T4–T6—discrimination correlated with attention r=0.60, p<0.001 and working memory r=0.48, p<0.01; RT correlated negatively with working memory r=-0.30, p=0.048; no significant production correlations. Executive function showed no significant correlations with perception or production measures across pairs.
Discussion
Findings show cognitive functions modulate tone merging differently across tone pairs and modalities, aligning with the idea that perception and production interact but are not uniformly coupled. T2–T5 appears fully merged in Macau’s younger speakers, with no cognitive correlates—consistent with a stabilized single rising-tone category. T3–T6, at an accelerated merger stage, demands more cognitive resources for discrimination: level tones rely mainly on average Fo with overlapping perceptual spaces, increasing difficulty. Accordingly, attention and working memory positively relate to T3–T6 perceptual discrimination and speed, with attention also weakly linked to production distinctiveness. T4–T6, earlier in the merger and offering multiple distinguishing cues (contour, duration, creaky voice), shows cognitive associations only in perception, not production—suggesting lower production demands and earlier perceptual sensitivity. The absence of executive function effects may reflect the young, healthy sample and potential limitations in the adapted auditory Stroop design. Comparisons with Hong Kong and Zhuhai suggest that perception–production coupling strengthens during active merger stages (e.g., T3–T6 in Macau, T2–T5 in Hong Kong), weakens at early stages (T4–T6), and disappears after completion (T2–T5 in Macau). This pattern supports models like DIVA, where accurate perception underpins production calibration. The authors propose a “sliding window” account: cognitive functions first impact the most unstable pair in perception, then production, and later shift as the system stabilizes. Differences in tone contours, cue availability, child acquisition patterns for rising tones, lexical frequency and phonemic load (favoring T5→T2) further explain differential merger trajectories and cognitive demands.
Conclusion
The study demonstrates that individual differences in attention and working memory modulate Cantonese tone merging in Macau, with effects contingent on merger stage and modality. T2–T5 (completed merger) shows no cognitive association; T3–T6 (intermediate/accelerated) shows associations in both perception and production; T4–T6 (early/slow) shows associations only in perception. Results provide a new perspective on the cognitive origins of tonal variation and refine understanding of perception–production dynamics across merger stages, consistent with perception-guided production models. Future work should use more balanced samples across merger statuses, refine executive function measures, and investigate neurobiological mechanisms linking perceptual and articulatory representations over time.
Limitations
Key limitations include imbalance in merged vs unmerged samples within tone pairs, especially few production-merged cases for T4–T6, potentially reducing power to detect production–cognition links. The adapted auditory Stroop may be suboptimal for executive function in Cantonese (e.g., potential congruency issues with 高/低 and artificially adjusted pitch), and the young, healthy sample may have limited EF variance. Task difficulty may have been insufficient to engage executive control robustly.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny