logo
ResearchBunny Logo
A neural speech decoding framework leveraging deep learning and speech synthesis

Engineering and Technology

A neural speech decoding framework leveraging deep learning and speech synthesis

X. Chen, R. Wang, et al.

This study introduces a novel deep learning neural speech-decoding framework that pairs an ECoG decoder with a differentiable speech synthesizer and a speech-to-speech auto-encoder to produce interpretable speech parameters and natural-sounding speech. The approach is highly reproducible across 48 participants, yields high correlation even with causal operations suitable for real-time prostheses, and works with left or right hemisphere coverage. This research was conducted by Xupeng Chen, Ran Wang, Amirhossein Khalilian-Gourtani, Leyao Yu, Patricia Dugan, Daniel Friedman, Werner Doyle, Orrin Devinsky, Yao Wang, and Adeen Flinker.

00:00
00:00
~3 min • Beginner • English
Abstract
Decoding human speech from neural signals is essential for brain-computer interface (BCI) technologies that aim to restore speech in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity and high dimensionality. Here we present a novel deep learning-based neural speech decoding framework that includes an ECoG decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable speech synthesizer that maps speech parameters to spectrograms. We have developed a companion speech-to-speech auto-encoder consisting of a speech encoder and the same speech synthesizer to generate reference speech parameters to facilitate the ECoG decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Our experimental results show that our models can decode speech with high correlation, even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. Finally, we successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with deficits resulting from left hemisphere damage.
Publisher
Nature Machine Intelligence
Published On
Apr 08, 2024
Authors
Xupeng Chen, Ran Wang, Amirhossein Khalilian-Gourtani, Leyao Yu, Patricia Dugan, Daniel Friedman, Werner Doyle, Orrin Devinsky, Yao Wang, Adeen Flinker
Tags
Electrocorticography (ECoG)
Neural speech decoding
Differentiable speech synthesizer
Speech-to-speech auto-encoder
Brain-computer interface
Real-time neural prostheses
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny