Computer ScienceFindings of the Association for Computational Linguistics: ACL 2025

BESSTIE: A Benchmark for Sentiment and Sarcasm Classification for Varieties of English

D. Srirag, A. Joshi, et al.

BESSTIE introduces the first labelled benchmark for sentiment and sarcasm across three English varieties (en-AU, en-IN, en-UK), built from Google Places reviews and Reddit comments with manual and automatic validation. Nine large language models were fine-tuned and evaluated, revealing consistent advantages on inner-circle varieties (en-AU, en-UK) and challenges in cross-variety generalisation—especially for sarcasm. Research conducted by Dipankar Srirag, Aditya Joshi, Jordan Painter, and Diptesh Kanojia. Dataset available on Hugging Face.... show more

General Summary Metrics

Abstract

Despite large language models (LLMs) being known to exhibit bias against non-standard language varieties, there are no known labelled datasets for sentiment analysis of English. To address this gap, we introduce BESSTIE, a benchmark for sentiment and sarcasm classification for three varieties of English: Australian (en-AU), Indian (en-IN), and British (en-UK). We collect datasets for these language varieties using two methods: location-based for Google Places reviews, and topic-based filtering for Reddit comments. To assess whether the dataset accurately represents these varieties, we conduct two validation steps: (a) manual annotation of language varieties and (b) automatic language variety prediction. Native speakers of the language varieties manually annotate the datasets with sentiment and sarcasm labels. We perform an additional annotation exercise to validate the reliance of the annotated labels. Subsequently, we fine-tune nine large language models (LLMs) (representing a range of encoder/decoder and mono/multilingual models) on these datasets, and evaluate their performance on the two tasks. Our results show that the models consistently perform better on inner-circle varieties (i.e., en-AU and en-UK), in comparison with en-IN, particularly for sarcasm classification. We also report challenges in cross-variety generalisation, highlighting the need for language variety-specific datasets such as ours. BESSTIE promises to be a useful evaluative benchmark for future research in equitable LLMs, specifically in terms of language varieties. The BESSTIE dataset is publicly available at: https://huggingface.co/datasets/unswn1porg/BESSTIE.

Publisher

Findings of the Association for Computational Linguistics: ACL 2025

Published On

Jul 27, 2025

Authors

Dipankar Srirag, Aditya Joshi, Jordan Painter, Diptesh Kanojia

DOI

https://doi.org/10.48550/arXiv.2412.04726

Explore these studies to deepen your understanding

Adjacent work that informs or extends this paper's methodology and findings.

Linguistics and Languages

The effectiveness of ChatGPT as a lexical tool for English, compared with a bilingual dictionary and a monolingual learner's dictionary

R. Lew, B. Ptasznik, et al.

Political Science

The national security law for Hong Kong: a corpus-driven comparative study of media representations between China's and Anglo-American English-language press

Z. Hou and Q. Peng

Psychology

Neuroimaging the effects of smartphone (over-)use on brain function and structure—a review on the current state of MRI-based findings and a roadmap for future research

C. Montag and B. Becker

Medicine and Health

Fostering a healthy public for men and HIV: a case study of the Movement for Change and Social Justice (MCSJ)

C. J. Colvin, M. V. Pinxteren, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 22+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny