logo
ResearchBunny Logo
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models

Computer Science

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models

A. Nandy, Y. Agarwal, et al.

Can AI spot a joke in a picture? This paper introduces three tasks—satirical image detection, satirical image understanding, and satirical image completion—and releases YesBut, a 2,547-image dataset (plus 119 real satirical photos) showing current vision-language models struggle in zero-shot settings. This research was conducted by the authors listed in <Authors> tag: Abhilash Nandy, Yash Agarwal, Ashish Patwa, Millon Madhur Das, Aman Bansal, Ankit Raj, Pawan Goyal, Niloy Ganguly.... show more
Abstract
Understanding satire and humor is challenging for current Vision-Language models. This paper introduces three tasks—Satirical Image Detection (classify whether an image is satirical), Satirical Image Understanding (generate the reason the image is satirical), and Satirical Image Completion (given half of an image, select the correct other half from two options so the whole image is satirical)—and releases YesBut, a high-quality dataset of 2,547 images (1,084 satirical and 1,463 non-satirical) spanning multiple artistic styles to evaluate these tasks. Each satirical image depicts a normal scenario alongside a conflicting, funny, or ironic scenario. Despite VL models’ success on multimodal tasks like VQA and captioning, benchmarking shows they perform poorly on the proposed YesBut tasks in zero-shot settings by both automated and human evaluation. An additional set of 119 real satirical photographs is also released for further research.
Publisher
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Published On
Nov 12, 2024
Authors
Abhilash Nandy, Yash Agarwal, Ashish Patwa, Millon Madhur Das, Aman Bansal, Ankit Raj, Pawan Goyal, Niloy Ganguly
Tags
satirical image detection
satirical image understanding
satirical image completion
vision-language models
YesBut dataset
multimodal benchmarking
zero-shot evaluation
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny