Computer ScienceProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models

A. Nandy, Y. Agarwal, et al.

Can AI spot a joke in a picture? This paper introduces three tasks—satirical image detection, satirical image understanding, and satirical image completion—and releases YesBut, a 2,547-image dataset (plus 119 real satirical photos) showing current vision-language models struggle in zero-shot settings. This research was conducted by the authors listed in <Authors> tag: Abhilash Nandy, Yash Agarwal, Ashish Patwa, Millon Madhur Das, Aman Bansal, Ankit Raj, Pawan Goyal, Niloy Ganguly.... show more

General Summary Metrics

Abstract

Understanding satire and humor is challenging for current Vision-Language models. This paper introduces three tasks—Satirical Image Detection (classify whether an image is satirical), Satirical Image Understanding (generate the reason the image is satirical), and Satirical Image Completion (given half of an image, select the correct other half from two options so the whole image is satirical)—and releases YesBut, a high-quality dataset of 2,547 images (1,084 satirical and 1,463 non-satirical) spanning multiple artistic styles to evaluate these tasks. Each satirical image depicts a normal scenario alongside a conflicting, funny, or ironic scenario. Despite VL models’ success on multimodal tasks like VQA and captioning, benchmarking shows they perform poorly on the proposed YesBut tasks in zero-shot settings by both automated and human evaluation. An additional set of 119 real satirical photographs is also released for further research.

Publisher

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Published On

Nov 12, 2024

Authors

Abhilash Nandy, Yash Agarwal, Ashish Patwa, Millon Madhur Das, Aman Bansal, Ankit Raj, Pawan Goyal, Niloy Ganguly

DOI

https://doi.org/10.48550/arXiv.2409.13592

Explore these studies to deepen your understanding

Adjacent work that informs or extends this paper's methodology and findings.

Social Work

Evaluating the effectiveness of the Kidogo model in empowering women and strengthening their capacities to engage in paid labor opportunities through the provision of quality childcare: a study protocol for an exploratory study in Nakuru County, Kenya

K. Okelo, M. Nampijja, et al.

Psychology

Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation

E. C. Stade, S. W. Stirman, et al.

Medicine and Health

A framework for human evaluation of large language models in healthcare derived from literature review

T. Y. C. Tam, S. Sivarajkumar, et al.

Medicine and Health

Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions

O. R. Sarrias, M. P. M. D. Prado, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 22+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny