logo
ResearchBunny Logo
Abstract
Existing task-oriented dialogue (TOD) datasets primarily focus on written text, creating a gap between research and realistic spoken conversations. To bridge this gap, the authors introduce SpokenWOZ, a large-scale speech-text dataset for spoken TOD. SpokenWOZ comprises 8 domains, 203k turns, 5.7k dialogues, and 249 hours of audio from human-to-human interactions. It incorporates common spoken characteristics like word-by-word processing and commonsense reasoning, and introduces new challenges: cross-turn slot and reasoning slot detection. Experiments on various models, including text-modal baselines, dual-modal baselines, and LLMs (like ChatGPT), reveal significant room for improvement in handling spoken conversation nuances.
Publisher
Not specified in the provided text
Published On
Jan 01, 2023
Authors
Shuzheng Si, Wentao Ma, Yuchuan Wu, Yinpei Dai, Haoyu Gao, Ting-En Lin, Hangyu Li, Rui Yan, Fei Huang, Yongbin Li
Tags
task-oriented dialogue
SpokenWOZ dataset
speech-text
commonsense reasoning
spoken characteristics
cross-turn slot detection
reasoning slot detection
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny