logo
ResearchBunny Logo
Abstract
This study investigates the use of large language models (LLMs) for automated essay scoring (AES) of non-native Japanese writing. It compares the performance of GPT-4, BERT, a Japanese local LLM (Open-Calm large model), and two conventional machine learning methods (Jess and JWriter) using a dataset of 1400 essays from learners with diverse first languages. GPT-4 significantly outperforms other models in annotation accuracy and proficiency level prediction. The study also highlights the importance of prompt engineering in achieving reliable LLM-based AES.
Publisher
Humanities and Social Sciences Communications
Published On
Jun 03, 2024
Authors
Wenchao Li, Haitao Liu
Tags
large language models
automated essay scoring
non-native Japanese writing
GPT-4
machine learning
prompt engineering
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny