logo
ResearchBunny Logo
Deepfake audio as a data augmentation technique for training automatic speech to text transcription models

Computer Science

Deepfake audio as a data augmentation technique for training automatic speech to text transcription models

A. R. Ferreira and C. E. C. Campelo

Discover a novel data augmentation framework using deepfake audio to enhance automatic speech-to-text models, particularly for less-popular languages. This groundbreaking research, conducted by Alexandre R Ferreira and Cláudio E C Campelo, reveals the impact of deepfake audio quality on model performance, paving the way for improved ASR systems.

00:00
00:00
Playback language: English
Abstract
Training robust automatic speech-to-text (ASR) models requires large, diverse, labeled datasets. This is challenging, especially for less-popular languages. This paper proposes a data augmentation framework using deepfake audio to address this issue. Experiments using a voice cloner and an Indian English dataset showed that while the framework is functional, the quality of the deepfake audio significantly impacted the ASR model's performance, leading to increased Word Error Rate (WER).
Publisher
arXiv
Published On
Sep 22, 2023
Authors
Alexandre R Ferreira, Cláudio E C Campelo
Tags
speech-to-text
ASR models
data augmentation
deepfake audio
Word Error Rate
Indian English
audio processing
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny