Computer ScienceNot specified in the provided text

SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue in Multiple Domains

S. Si, W. Ma, et al.

Discover the groundbreaking SpokenWOZ dataset, a large-scale speech-text resource for task-oriented dialogue, featuring over 203k turns and 249 hours of real human interactions. This research, conducted by an accomplished team from Alibaba Group and the University of Michigan, tackles the complexities of spoken language that traditional text datasets often overlook.... show more

General Summary Metrics

Abstract

Task-oriented dialogue (TOD) models have made significant progress on written datasets, but there remains a gap to realistic spoken conversations. Prior spoken TOD datasets are small-scale and often lack human-to-human audio or focus mainly on ASR robustness, overlooking unique spoken challenges. The authors introduce SpokenWOZ, a large-scale speech-text dataset for spoken TOD with 8 domains, 203k turns, 5.7k dialogues, and 249 hours of human-to-human audio. SpokenWOZ captures spoken characteristics such as incremental word-by-word processing and commonsense reasoning, and defines new challenges: cross-turn slot and reasoning slot detection. Comprehensive experiments with text-only baselines, new dual-modal models, and LLMs show substantial headroom for improvement in spoken settings, including for fine-tuned models and LLMs (e.g., ChatGPT).

Publisher

Not specified in the provided text

Published On

Jan 01, 2023

Authors

Shuzheng Si, Wentao Ma, Yuchuan Wu, Yinpei Dai, Haoyu Gao, Ting-En Lin, Hangyu Li, Rui Yan, Fei Huang, Yongbin Li

Explore these studies to deepen your understanding

Adjacent work that informs or extends this paper's methodology and findings.

Engineering and Technology

A tailored, electronic textile conformable suit for large-scale spatiotemporal physiological sensing *in vivo*

I. Wicaksono, C. I. Tucker, et al.

Education

Setting parameters for developing undergraduate expertise in transdisciplinary problem solving at a university-wide scale: a case study

G. Bammer, C. A. Browne, et al.

The Arts

Hidden musicality in Chinese Xiangsheng: a response to the call for interdisciplinary research in studying speech and song

F. R. S. Lawson

Earth Sciences

The season for large fires in Southern California is projected to lengthen in a changing climate

C. Dong, A. P. Williams, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 22+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny