logo
Loading...
Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor

Computer Science

Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor

A. Baluja

LLMs excel at text but can miss the punchline—humor often depends on pronunciation and intonation. This study presents a simple multimodal prompting method that supplies an LLM with both the joke text and a TTS-generated spoken form, improving humor explanations across datasets. Research conducted by Ashwin Baluja (Northwestern University).... show more
Abstract
While Large Language Models (LLMs) have demonstrated impressive natural language understanding capabilities across various text-based tasks, understanding humor has remained a persistent challenge. Humor is frequently multimodal, relying not only on the meaning of the words, but also their pronunciations, and even the speaker's intonations. In this study, we explore a simple multimodal prompting approach to humor understanding and explanation. We present an LLM with both the text and the spoken form of a joke, generated using an off-the-shelf text-to-speech (TTS) system. Using multimodal cues improves the explanations of humor compared to textual prompts across all tested datasets.
Publisher
Proceedings of the 1st Workshop on Computational Humor (CHum)
Published On
Jan 19, 2025
Authors
Ashwin Baluja
Tags
Large Language Models
humor understanding
multimodal prompting
text-to-speech (TTS)
spoken-form cues
explainability
multimodal cues
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 22+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny