logo
ResearchBunny Logo
Introduction
The study explores the long-standing debate about shared versus distinct mechanisms in speech and music processing. Speech and music, uniquely human behaviors, share rhythmic and hierarchical structures. However, they exhibit distinct rhythmic profiles and rate-specific processing differences. The research question focuses on whether speech and music recruit distinct cortical motor timing mechanisms related to their common motor effectors. The study uses auditory perception and perception-production synchronization tasks to probe rate-specific processing modulated by different motor effectors (whispering and finger-tapping) for both speech (syllable sequences) and music (piano tone sequences) stimuli at slow (~2 Hz) and fast (~4.5 Hz) rates. The hypothesis is that specific motor effectors recruit distinct cortical rhythmic motor timing circuits with different optimal processing rates, influencing auditory-motor coupling. Specifically, it is predicted that speech-associated motor effectors will show better synchronization at fast rates, while music-associated effectors will perform better at slower rates. Auditory perception performance is also hypothesized to mirror synchronization results, with higher performance at rates enhancing processing in each domain, and synchronization predicting perception performance at corresponding time scales. Alternatively, the study considers the possibility that generally optimal rhythmic timing, facilitated by the motor system, might occur at slower time scales.
Literature Review
Existing research highlights the rhythmic structure of speech and music, suggesting overlapping processing mechanisms. Endogenous brain rhythms, within the same frequency range as speech and music signals, may support predictive processing and event segmentation. Speech research emphasizes the role of theta-range (~4.5 Hz) auditory cortex rhythms, while the motor system's impact on rhythmic prediction is also discussed. The motor system's involvement is evident through activation of production-related regions by listening to speech and music. Slow delta rhythms (~2 Hz) in the supplementary motor area are suggested for motor system-provided temporal predictions, aligning with musical beats but less understood in speech. Despite overlap, speech and music show differing rhythmic characteristics, with music exhibiting dominant modulations around 1–2 Hz and speech around 4–8 Hz. These differences are reflected in perceptual performance, with beat deviance detection maximal around 1.4 Hz in music and speech comprehension highest around 4.5 Hz. Speech and music production also employ different motor effectors (speech uses the mouth and vocal cords, while music often uses hands and arms). Previous research indicates different effector sensitivities for production rates, with the mouth being superior for fast rates. Spontaneous production rates also differ, with finger-tapping around 2 Hz and syllable production around 4–8 Hz. In speech, auditory-motor region connectivity is associated with perception-production synchronicity at syllabic rates, strongest around 4.5 Hz. The study aims to clarify whether music perception-production synchronization shows similar rate restrictions and whether optimal synchronization rates differ between speech and music.
Methodology
The study, pre-registered on asPredicted.org, involved 66 initially recruited participants, with 62 and 57 included in the synchronization and perception tasks respectively after exclusions for various reasons. The tasks used syllable and piano tone sequences as stimuli, generated using MBROLA speech synthesizer and MIDIUtil, respectively. Both tasks were performed at slow (~2 Hz) and fast (~4.5 Hz) rates. The synchronization task used adapted versions of the accelerated SSS test, where participants whispered "TEH" or tapped in synchrony with the sequences. Stimuli were presented binaurally using ER3c in-ear headphones. Synchronization strength was measured using the phase-locking value (PLV) between acoustic and motor output envelopes, calculated using the NSL Auditory Model toolbox. PLVs were normalized using a permutation distribution to correct for acoustic differences between tapping and whispering and differences in sequence length. The auditory perception task involved identifying small rhythmic deviations in isochronous sequences. Data analysis included linear mixed models (LMM) for the synchronization task and generalized linear mixed models (GLMM) for the perception task. The LMM included rate, motor effector (tapping vs. whispering), stimulus type (tones vs. syllables), and the interaction between rate and motor effector as predictors. Characteristics of the acoustic and motor envelopes (peak amplitude and width) were also included. The GLMM included rate, stimulus category, and their interaction, as well as characteristics of the acoustic envelope and principal components from the PCA analysis of synchronization data as predictors. PCA was performed on the synchronization data to summarize the relationships between synchronization conditions. Post-hoc comparisons were conducted for both models. Control analyses addressed order effects and musical sophistication's influence. Statistical analyses were performed using Matlab and R.
Key Findings
The linear mixed model analysis of the synchronization task revealed significant main effects of rate and stimulus type, and a significant interaction between rate and motor effector. Synchronization was generally better at slow rates. At slow rates, finger-tapping synchronization was superior to whispering; however, no difference was found at fast rates. A Bayesian paired samples t-test provided moderate evidence supporting the null hypothesis of no difference between tapping and whispering at fast rates (BF01 = 9.41). The principal component analysis (PCA) of the synchronization data yielded three components: a fast component (capturing variance across fast conditions), a slow whispering component, and a slow tapping component. This suggests independent processes at slow rates but dependent processes at fast rates. The generalized linear mixed model analysis of the perception task showed a significant interaction between rate and stimulus. Syllable perception was better at fast rates, while tone perception was better at slow rates. The width of the acoustic envelope peaks also affected perception performance. Importantly, both the fast and slow tapping synchronization components significantly predicted perception performance, indicating a link between motor and perceptual performance. Control analyses showed no significant order effects, but musical sophistication influenced synchronization performance, primarily affecting the fast component.
Discussion
The study's findings indicate that duration discrimination shows rate-specific effects in both speech and music perception, consistent with dominant acoustic rhythms in produced speech and music. The synchronization task showed an overall advantage for slow rates, aligning with spontaneous production rates and neural findings suggesting delta rhythms constrain rhythmic motor timing. The interaction between rate and motor effector suggests that music-associated motor effectors (finger-tapping) show an advantage at slow rates. The PCA results further supported the presence of domain-specific processes at slow rates, with independent patterns for different motor effectors. At fast rates, however, domain-general influences seem to operate. The perception task results highlight the importance of rate-specific processes in speech and music perception, matching the dominant rates in the motor domain. The predictive relationship between synchronization and perception performances confirms the link between motor and perceptual processes. The study suggests that musical sophistication is particularly linked to the common influences driving synchronization ability at fast rates, independent of the motor effector.
Conclusion
This study demonstrates rate-specific processing in speech and music perception and auditory-motor synchronization. While overall synchronization was better at slow rates, distinct mechanisms seem to operate at slow rates for different motor effectors associated with speech and music, while a common mechanism is suggested to be involved in the faster rates. Future research could explore more complex stimuli, address limitations related to whispering versus natural speaking, and investigate vocal music synchronization to further clarify the interaction between motor effectors, rhythm perception and production across speech and music.
Limitations
The study's limitations include the use of simplified stimuli (syllable and piano tone sequences) and motor effectors (whispering and finger-tapping) rather than natural speech and music production. While this allowed for controlled acoustic matching, it may not fully capture the complexity of real-world speech and music processing. The use of whispering instead of natural speaking in the synchronization task is another limitation. Finally, the study may not have sufficient power to detect small effect sizes in some comparisons, particularly regarding the difference between tapping and whispering at fast rates.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny