logo
ResearchBunny Logo
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

Computer Science

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

J. Liu, S. Chunqiu, et al.

Discover groundbreaking insights from researchers Jiawei Liu, Steven Chunqiu, Xia, Yuyao Wang, and Lingming Zhang as they unveil EvalPlus, a revolutionary framework that challenges the adequacy of existing code generation benchmarks. Their findings reveal alarming levels of undetected errors in LLM-generated code, urging a shift towards robust automated testing in code evaluation.

00:00
00:00
~3 min • Beginner • English
Citation Metrics
Citations
0
Influential Citations
0
Reference Count
0

Note: The citation metrics presented here have been sourced from Semantic Scholar and OpenAlex.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny