Medicine and Health
Neural control of lexical tone production in human laryngeal motor cortex
J. Lu, Y. Li, et al.
Discover the neural mechanisms behind Mandarin tonal production in this fascinating study by Junfeng Lu, Yuanning Li, Zehao Zhao, Yan Liu, Yanming Zhu, Ying Mao, Jinsong Wu, and Edward F. Chang. Through advanced cortical recordings and stimulation techniques, researchers reveal how the brain encodes pitch dynamics rather than categorizing tones. This breakthrough enhances our understanding of vocal control in language.
~3 min • Beginner • English
Introduction
Vocal pitch is a crucial acoustic cue in speech. In non-tonal languages (e.g., English), pitch conveys prosody and intonation, while in tonal languages (e.g., Mandarin), pitch contours (lexical tones) differentiate word meanings. Mandarin has four tones characterized by initial pitch height and contour direction: high-level (Tone 1), rising (Tone 2), dipping (Tone 3), and falling (Tone 4). Producing these tones requires precise laryngeal control of vocal fold tension by intrinsic muscles (cricothyroid for pitch raising; thyroarytenoid for pitch lowering). Prior neuroimaging and neurophysiological work identified bilateral dorsal and ventral laryngeal motor cortex (LMC) within ventral sensorimotor cortex and showed monotonic pitch encoding and distinct voicing representations. However, how humans dynamically control laryngeal muscles to generate rapid, bidirectional pitch trajectories for lexical tones at the syllable timescale remained unknown. This study investigates the neural basis of lexical tone production in native Mandarin speakers using high-density intracranial recordings, computational modeling, decoding analyses, and direct cortical stimulation.
Literature Review
Previous studies identified two LMC regions in human ventral sensorimotor cortex associated with laryngeal movements. Dorsal LMC showed monotonic encoding of pitch rising during emphasis tasks in English, and distinct neural populations in dorsal and ventral LMC encode voicing. Imaging studies of pitch control have been limited by temporal resolution, obscuring fine-scale, rapid pitch dynamics occurring within ~100 ms. Intracranial electrophysiology in non-tonal languages linked dorsal LMC to vocal pitch control for intonation, but non-tonal intonation largely involves pitch rising above neutral pitch and does not encompass frequent pitch lowering below neutral seen in tonal languages. The Fujisaki model provides a physiological framework for decomposing Mandarin F0 contours into phrase and tone commands, enabling analysis of positive (rising) and negative (lowering) control components. This work builds on these foundations to test whether LMC encodes tone categories or articulatory command-like features governing pitch dynamics in tonal speech.
Methodology
Participants: Eight right- or left-hemisphere eloquent brain tumor patients (age 29–51; 5 males) undergoing awake language mapping at Huashan Hospital participated with informed consent (IRB: HIRB KY2017-437). High-density ECoG grids (one or two 8×16, 4 mm spacing; 128 channels each) were placed over sensorimotor cortex intraoperatively.
Data acquisition and preprocessing: Local field potentials were recorded at 3052 Hz (TDT system), notch-filtered at 50/100/150 Hz, and artifacts removed. High-gamma (70–150 Hz) envelopes were extracted using Hilbert transforms across 8 log-spaced bands, averaged, downsampled to 100 Hz, and z-scored per block.
Tasks: Two tone production paradigms. Version 1 (S1–S2): visually cued naming; 60 words across 15 syllables with four tones each; 3 repetitions per trial; S1 one block (45 reps/tone), S2 two blocks (90 reps/tone). Version 2 (S3–S8): auditory repetition of /ma/ and /mi/ with four tones each; 15 reps per item; four blocks (120 reps/tone). A sentence reading task used 20 seven-character Mandarin sentences (Fu corpus); each repeated twice per block; 2–5 blocks per participant (80–200 sentences; 560–1400 tones total).
Electrode localization: Corner electrode positions were recorded with neuronavigation, aligned to preoperative MRI using intraoperative photos, and remaining electrodes localized via interpolation/extrapolation (img_pipe).
Selection of electrodes: Speech-responsive electrodes were identified by aligning high-gamma to syllable onsets (−300 to +100 ms) and comparing mean activity against 1000-permutation baselines; electrodes exceeding mean ± 5 SD for continuous 100 ms were marked speech-responsive. Tone-discriminative electrodes were determined by one-way ANOVA F-statistics across four tones per time point (−300 to +200 ms).
Acoustic processing: Pitch contours (F0) were extracted (Praat), proportionally time-warped to 400 ms (40 points at 100 Hz). PCA of 2891×40 pitch trajectories identified principal components capturing >95% variance.
Encoding models: Single-electrode encoding assessed unique variance explained (R²) by pitch features, including pitch height and pitch change, compared with intensity and syllable onset controls. A computational tone production model (Fujisaki) was adapted to decompose F0 into phrase and tone commands; rapid tone commands (positive/negative) were extracted and used as regressors to predict high-gamma activity. Electrode tuning curves were computed as mean high-gamma vs. tone command values to identify positive vs. negative tuning patterns.
Decoding analysis: Multivariate pattern classification performed pair-wise tone decoding using population activity from speech-responsive LMC electrodes in sliding 100 ms windows; significance tested with two-sided t-tests (Bonferroni corrected); classifier weights inspected at peak accuracy time.
Direct electrical stimulation (DES): In five awake surgical patients (2 left, 3 right), bipolar DES was applied to dLMC sites during production of /mā/ (to probe pitch rising) or /má/ (to probe pitch lowering). For each site, at least five stimulated trials were compared to ≥5 non-stimulated control repetitions; pitch changes were evaluated over time with paired t-tests (FDR correction). Sites evoking pitch rising, pitch lowering, speech arrest, anomia, and other motor responses were mapped bilaterally.
Key Findings
- Neural representation: Tone-discriminating electrodes in bilateral LMC primarily encoded pitch dynamics (pitch height and pitch change) rather than categorical tone labels.
- Correlation analyses across speech-selective electrodes (n=448):
• Unique variance by pitch features vs. tone discriminability: r=0.75, P=4.3×10⁻³⁶.
• Intensity vs. tone discriminability: r=0.05, P=0.37.
• Syllable onset vs. tone discriminability: r=0.05, P=0.40.
• Binary pitch feature vs. tone discriminability: r=0.03, P=0.67.
• Tone category (R²) vs. tone discriminability: r=0.07, P=0.25.
• Pitch height vs. tone discriminability: r=0.45, P=5.4×10⁻¹¹.
• Pitch change vs. tone discriminability: r=0.61, P=4.2×10⁻²³.
• Pitch change and pitch height contributed complementary unique variance.
- Fujisaki model: Extracted positive and negative tone commands reconstructed dynamic pitch contours with high fidelity. Differential neural activity across tones was well explained by tone commands (Pearson r=0.72, p=7.9×10⁻⁴¹).
- Electrode tuning: Two distinct tuning patterns were identified—positive tuning to tone commands (22/54 electrodes) and negative tuning (32/54). Tuning electrodes were distributed bilaterally in LMC: left dLMC (16/54), right dLMC (21/54), and left vLMC (14/54), indicating distributed control of pitch rising and lowering.
- Decoding: Pair-wise tone classification from population LMC activity exceeded chance with a 100 ms sliding window; peak accuracy occurred around vowel onset; classifier weights were distributed across electrodes, indicating a distributed code.
- Causal evidence via DES: Stimulating dLMC evoked both pitch rising and pitch lowering during tone production. Example: during /mā/, stimulation increased F0 (stimulation onset ~452±49 ms); during /má/, stimulation decreased F0 (onset ~203±19 ms), with significant time points after FDR correction. Pitch rising sites were found bilaterally in dLMC; pitch lowering sites were elicited in left dLMC of two patients. Speech arrest sites were also observed near pitch modulation sites in bilateral dLMC and in left vLMC.
Discussion
The study addresses how precise, rapid pitch dynamics for lexical tones are neurally controlled. Results show that local populations within bilateral LMC encode articulatory command-like features—specifically, pitch rising and pitch lowering—rather than categorical tone identity or simple monotonic pitch values. This resolves the research question by linking cortical activity to the kinematic control underpinning dynamic F0 trajectories in tonal speech. The combination of high temporal resolution ECoG, computational decomposition of F0 into positive and negative tone commands, and causal DES demonstrates that distinct LMC populations independently command rising and lowering pitch movements. These findings extend prior work in non-tonal languages by encompassing the full dynamic range of pitch (including lowering below neutral), clarify the level of sensorimotor representation as coordinated laryngeal control commands rather than single-muscle activations, and have implications for speech BCIs: distributed neural patterns in LMC can decode lexical tone, suggesting integration of articulator kinematics and pitch control channels for tonal language prostheses.
Conclusion
High-density intracranial recordings, computational modeling, decoding, and direct cortical stimulation reveal that bilateral LMC contains distributed neural populations that independently control pitch rising and pitch lowering to generate lexical tone dynamics in Mandarin. The work advances understanding of laryngeal motor cortex coding from monotonic pitch representations to bidirectional command-like control and provides a cortical map relevant for tonal language speech BCIs. Future research should increase participant numbers, map bilateral LMC simultaneously within individuals, and combine single-neuron recordings with laryngeal EMG and direct visualization of vocal folds to precisely link cortical activity to specific laryngeal muscle coordination patterns.
Limitations
- Interindividual variability in tone production cortex and the inability to record from both left and right LMC simultaneously in the same patient limited observation of bidirectional tuning across all participants.
- Lack of concurrent monitoring of laryngeal muscle activity and direct visualization of vocal fold movements prevents definitive attribution of speech arrest and detailed muscle-level mapping; future studies should integrate EMG and fiberoptic assessments.
- Sample comprised patients undergoing awake tumor surgery, which may limit generalizability to healthy populations.
Related Publications
Explore these studies to deepen your understanding of the subject.

