Computer SciencearXiv

Deepfake audio as a data augmentation technique for training automatic speech to text transcription models

A. R. Ferreira and C. E. C. Campelo

Discover a novel data augmentation framework using deepfake audio to enhance automatic speech-to-text models, particularly for less-popular languages. This groundbreaking research, conducted by Alexandre R Ferreira and Cláudio E C Campelo, reveals the impact of deepfake audio quality on model performance, paving the way for improved ASR systems.... show more

General Summary Metrics

Abstract

To train transcriptor models that produce robust results, a large and diverse labeled dataset is required. Finding such data with the necessary characteristics is a challenging task, especially for languages less popular than English. Moreover, producing such data requires significant effort and often money. Therefore, a strategy to mitigate this problem is the use of data augmentation techniques. In this work, we propose a framework that approaches data augmentation based on deepfake audio. To validate the produced framework, experiments were conducted using existing deepfake and transcription models. A voice cloner and a dataset produced by Indians (in English) were selected, ensuring the presence of a single accent in the dataset. Subsequently, the augmented data was used to train speech to text models in various scenarios.

Publisher

arXiv

Published On

Sep 22, 2023

Authors

Alexandre R Ferreira, Cláudio E C Campelo

DOI

https://doi.org/10.48550/arxiv.2309.12802

Explore these studies to deepen your understanding

Adjacent work that informs or extends this paper's methodology and findings.

Economics

Heterogeneity in financing for development strategies as a hindering factor to achieve a global agreement on the 2030 Agenda

A. Sianes, L. A. Fernández-portillo, et al.

Medicine and Health

Usability Comparison Among Healthy Participants of an Anthropomorphic Digital Human and a Text-Based Chatbot as a Responder to Questions on Mental Health: Randomized Controlled Trial

A. O. Thunström, H. K. Carlsen, et al.

Medicine and Health

Usability Comparison Among Healthy Participants of an Anthropomorphic Digital Human and a Text-Based Chatbot as a Responder to Questions on Mental Health: Randomized Controlled Trial

A. O. Thunström, H. K. Carlsen, et al.

Computer Science

SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue in Multiple Domains

S. Si, W. Ma, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 22+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny