Computer Science

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

W. Chiang, L. Zheng, et al.

Discover Chatbot Arena, an open, crowdsourced platform that evaluates Large Language Models via pairwise human-preference comparisons, powered by over 240K votes and rigorous statistical ranking — research conducted by Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios N. Angelopoulos, Tianle Li, Dacheng Li, Banghua Zhu, Hao Zhang, Michael I. Jordan, Joseph E. Gonzalez, and Ion Stoica.

00:00

~3 min • Beginner • English

Index

Abstract

Large Language Models (LLMs) have unlocked new capabilities and applications; however, evaluating the alignment with human preferences still poses significant challenges. To address this issue, we introduce Chatbot Arena, an open platform for evaluating LLMs based on human preferences. Our methodology employs a pairwise comparison approach and leverages input from a diverse user base through crowdsourcing. The platform has been operational for several months, amassing over 240K votes. This paper describes the platform, analyzes the data we have collected so far, and explains the tried-and-true statistical methods we are using for efficient and accurate evaluation and ranking of models. We confirm that the crowdsourced questions are sufficiently diverse and discriminating and that the crowdsourced human votes are in good agreement with those of expert raters. These analyses collectively establish a robust foundation for the credibility of Chatbot Arena. Because of its unique value and openness, Chatbot Arena has emerged as one of the most referenced LLM leaderboards, widely cited by leading LLM developers and companies. Our demo is publicly available at https://chat.lmsys.org.

Publisher

arXiv

Published On

Mar 07, 2024

Authors

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios N. Angelopoulos, Tianle Li, Dacheng Li, Banghua Zhu, Hao Zhang, Michael I. Jordan, Joseph E. Gonzalez, Ion Stoica

DOI

https://doi.org/10.48550/arxiv.2403.04132

Related Publications

Explore these studies to deepen your understanding of the subject.

Chemistry

DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications

K. Rajan, H. O. Brinkhaus, et al.

Medicine and Health

Teaching resources for the European Open Platform for Prescribing Education (EurOP²E)—a nominal group technique study

M. J. Bakkum, B. J. Loobeek, et al.

Agriculture

Improving wheat grain composition for human health by constructing a QTL atlas for essential minerals

P. P. Sigalas, P. R. Shewry, et al.

Computer Science

An open source machine learning framework for efficient and transparent systematic reviews

R. V. D. Schoot, J. D. Bruin, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny