logo
ResearchBunny Logo
Deepfake audio as a data augmentation technique for training automatic speech to text transcription models

Computer Science

Deepfake audio as a data augmentation technique for training automatic speech to text transcription models

A. R. Ferreira and C. E. C. Campelo

Discover a novel data augmentation framework using deepfake audio to enhance automatic speech-to-text models, particularly for less-popular languages. This groundbreaking research, conducted by Alexandre R Ferreira and Cláudio E C Campelo, reveals the impact of deepfake audio quality on model performance, paving the way for improved ASR systems.... show more
Abstract
To train transcriptor models that produce robust results, a large and diverse labeled dataset is required. Finding such data with the necessary characteristics is a challenging task, especially for languages less popular than English. Moreover, producing such data requires significant effort and often money. Therefore, a strategy to mitigate this problem is the use of data augmentation techniques. In this work, we propose a framework that approaches data augmentation based on deepfake audio. To validate the produced framework, experiments were conducted using existing deepfake and transcription models. A voice cloner and a dataset produced by Indians (in English) were selected, ensuring the presence of a single accent in the dataset. Subsequently, the augmented data was used to train speech to text models in various scenarios.
Publisher
arXiv
Published On
Sep 22, 2023
Authors
Alexandre R Ferreira, Cláudio E C Campelo
Tags
speech-to-text
ASR models
data augmentation
deepfake audio
Word Error Rate
Indian English
audio processing
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny