logo
Loading...
Using a forced aligner for prosody research

Linguistics and Languages

Using a forced aligner for prosody research

H. Wu, J. Yun, et al.

Discover how the Montreal Forced Aligner revolutionizes prosody research in Mandarin Chinese! Authored by Hongchen Wu, Jiwon Yun, Xiang Li, Huiyi Huang, and Chuandong Liu, this study unveils the tool's precision in aligning syllables and phrases, showcasing efficiency that outpaces traditional human annotation while revealing the impact of audio quality.... show more
Abstract
Forced alignment is a speech technique that can automatically align audio files with transcripts. With the help of forced alignment tools, annotating audio files and creating annotated speech databases have become much more accessible and efficient. Researchers have recently started to evaluate the benefits and accuracy of forced aligners in speech research and have provided insightful suggestions for improvement. However, current work has so far paid little attention to evaluating forced aligners in prosody research, which focuses on suprasegmental features. In this paper, we take ambiguous sentence-level audio input in Mandarin Chinese, which can be disambiguated prosodically, to evaluate the alignment accuracy of the Montreal Forced Aligner (MFA). With a satisfactory result for syllable-by-syllable alignment, we further explore the possibility and benefits of using the forced alignment tool to generate phrase-by-phrase alignment. This topic has barely been studied in previous research on forced alignment. Our paper demonstrates that the forced alignment tool can effectively generate accurate alignment at both syllable and phrase levels for tonal languages, such as Mandarin. We found that the average differences between human annotators and MFA were smaller than the gold standard, indicating a satisfactory level of performance by the tool. Moreover, the MFA-assisted annotation rate by human transcribers was at least 20 times faster than previously reported manual annotation efficiency, providing significant time and resource savings for prosody researchers. Our results also suggest that phrase-level alignment accuracy of MFA can be affected by the quality of the recording, calling prosody researchers' attention to controlling the audio quality in the recording. The finding that de-stressed words/phrases pose challenges for MFA also provides a reference for improving forced aligners.
Publisher
Humanities and Social Sciences Communications
Published On
Jul 19, 2023
Authors
Hongchen Wu, Jiwon Yun, Xiang Li, Huiyi Huang, Chuandong Liu
Tags
Montreal Forced Aligner
Mandarin Chinese
prosody research
alignment accuracy
annotation efficiency
audio quality
de-stressed words
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny