logo
ResearchBunny Logo
SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue in Multiple Domains

Computer Science

SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue in Multiple Domains

S. Si, W. Ma, et al.

Discover the groundbreaking SpokenWOZ dataset, a large-scale speech-text resource for task-oriented dialogue, featuring over 203k turns and 249 hours of real human interactions. This research, conducted by an accomplished team from Alibaba Group and the University of Michigan, tackles the complexities of spoken language that traditional text datasets often overlook.

00:00
00:00
~3 min • Beginner • English
Abstract
Task-oriented dialogue (TOD) models have made significant progress on written datasets, but there remains a gap to realistic spoken conversations. Prior spoken TOD datasets are small-scale and often lack human-to-human audio or focus mainly on ASR robustness, overlooking unique spoken challenges. The authors introduce SpokenWOZ, a large-scale speech-text dataset for spoken TOD with 8 domains, 203k turns, 5.7k dialogues, and 249 hours of human-to-human audio. SpokenWOZ captures spoken characteristics such as incremental word-by-word processing and commonsense reasoning, and defines new challenges: cross-turn slot and reasoning slot detection. Comprehensive experiments with text-only baselines, new dual-modal models, and LLMs show substantial headroom for improvement in spoken settings, including for fine-tuned models and LLMs (e.g., ChatGPT).
Publisher
Not specified in the provided text
Published On
Jan 01, 2023
Authors
Shuzheng Si, Wentao Ma, Yuchuan Wu, Yinpei Dai, Haoyu Gao, Ting-En Lin, Hangyu Li, Rui Yan, Fei Huang, Yongbin Li
Tags
task-oriented dialogue
SpokenWOZ dataset
speech-text
commonsense reasoning
spoken characteristics
cross-turn slot detection
reasoning slot detection
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny