Education

Curriculum-Driven Edubot: A Framework for Developing Language Learning Chatbots through Synthesizing Conversational Data

Y. U. Li, S. Qu, et al.

Explore the groundbreaking Curriculum-Driven EduBot framework, developed by Y U Li, Shang Qu, Zhou Yu, Yu Li, Jili Shen, and Shangchao Min. This innovative tool utilizes large language models to enhance language learning through engaging, curriculum-aligned dialogues, surpassing ChatGPT in effective user adaptation.

00:00

Playback language: English

Index

Introduction

The increasing use of chatbots in education has revolutionized student-material interaction and teaching methods. However, existing chatbots often lack the structured, curriculum-aligned approach needed for effective language learning. Many chatbots focus on answering questions or following instructions, neglecting the organized progression and consistent content expected by students using textbooks. This can lead to inappropriate language or content, hindering learning. To overcome these limitations, the authors propose Curriculum-Driven EduBot, a framework for creating a chatbot specifically designed for language learning, using a structured curriculum. EduBot combines the interactive features of chatbots with the systematic material of English textbooks, acting as a conversational practice partner. The framework involves extracting topics from the textbook, using LLMs to generate dialogues with defined personas and textbook vocabulary, and then fine-tuning an open-source LLM with the generated data. This creates a chatbot that guides students through coherent and engaging dialogues tailored to their proficiency level, unlike existing chatbots that primarily function as Q&A systems. The efficacy of EduBot is demonstrated through a user study comparing it to ChatGPT.

Literature Review

The paper reviews existing research on AI in education, highlighting the use of AI for formative assessment, learning analytics, and personalized learning experiences. It emphasizes the potential of LLMs to enhance chatbot applications in education, citing examples such as using chatbots for teaching programming and providing grammar feedback. The authors also discuss the importance of curriculum-aligned approaches in language learning, referencing research on customized content delivery and the integration of curriculum content into learning platforms. The use of pre-trained language models (PLMs) to generate synthetic conversational data is also explored, particularly in the context of limited datasets and privacy concerns. Finally, the authors note previous research on using PLMs to augment conversational datasets for tasks such as emotional support dialogues and comprehension tasks, highlighting the novelty of using this approach for generating curriculum-based conversations.

Methodology

The Curriculum-Driven EduBot framework involves two main stages. First, conversational data is augmented. This involves three steps: 1. **Topic Augmentation:** Primary topics from the textbook are identified, and LLMs (specifically ChatGPT) are used to generate related subtopics, expanding the range of conversational possibilities. 2. **Persona Creation:** ChatGPT is used to create personas for the two dialogue participants (Person 1, representing the chatbot, and Person 2, representing the student). Person 2's persona is consistently defined as a student, while Person 1's persona is randomly generated, ensuring that the chatbot anticipates the student role of the user, and promoting conversations typical of student interactions. 3. **Dialogue Composition:** Using the generated personas and topics, ChatGPT generates synthetic dialogues. To tailor the dialogues to the user's English proficiency level, vocabulary from the textbook is integrated into the conversations. ChatGPT is instructed to make Person 1 (the chatbot persona) actively guide the conversation. The second stage involves fine-tuning an open-source LLM (Vicuna-13B) using the synthesized conversational data. The chatbot takes on the role of Person 1 in the training data. The Common European Framework of Reference for Languages (CEFR) is used to control the difficulty level of the language in the training process. When deployed, EduBot selects a unit from the textbook, randomly assigns a persona, chooses a topic, and samples vocabulary words to construct a prompt. This prompt is similar to the training prompts, ensuring that EduBot generates contextually appropriate and proficiency-level-appropriate responses.

Key Findings

The user study involved 24 college students who had recently completed a course using the "New College English" textbook. Participants engaged in conversations with both EduBot and ChatGPT, and subsequently completed a questionnaire evaluating several aspects of the chatbots. The results showed that EduBot outperformed ChatGPT across various metrics. Specifically: * **Curriculum Alignment:** While both chatbots performed comparably in terms of relevance to course material, EduBot demonstrated a greater ability to discuss topics beyond the textbook material, fostering broader conversational skills. * **English Proficiency Level:** EduBot showed a better alignment with the students' English proficiency level than ChatGPT; fewer students reported encountering vocabulary words they didn't understand in their interactions with EduBot. * **Role Identification:** EduBot was better at identifying and responding to the student's role, initiating conversations with relevant topics and questions, thus creating more engaging interactions. * **Conversation Quality:** EduBot generated more natural and realistic conversations, engaging users more actively. Students tended to give longer responses to EduBot, indicating a higher level of engagement. * **Vocabulary Inclusion:** EduBot successfully incorporated target vocabulary words from the textbook into its responses far more frequently than ChatGPT. * **Overall Usefulness:** A significant majority of students expressed a preference for EduBot, indicating a stronger willingness to recommend it to other students and a greater belief in its potential for improving conversational skills. ChatGPT was preferred only slightly for review of course material.

Discussion

The findings demonstrate that EduBot effectively combines the benefits of structured curriculum-based learning with the interactive nature of chatbots. By generating curriculum-aligned conversational data and fine-tuning an open-source LLM, EduBot overcomes the limitations of generic chatbots in the context of language learning. The user study results strongly support the effectiveness of EduBot in enhancing conversational skills and engagement compared to ChatGPT. The ability of EduBot to adapt to the user's proficiency level and engage them in more natural conversations highlights the potential of this approach for personalized language learning.

Conclusion

This paper presents a novel framework, Curriculum-Driven EduBot, for creating curriculum-aligned language learning chatbots. User studies show EduBot's superiority over existing state-of-the-art systems like ChatGPT in several key areas. Future work could focus on expanding the content, incorporating multimedia, and adding real-time feedback mechanisms to further enhance the learning experience.

Limitations

While EduBot demonstrates significant improvement over existing systems, some limitations were noted in the user study. Occasionally, EduBot included unnatural emotional descriptions or made incorrect assumptions about the user, demonstrating the need for stricter data filtering and improved context understanding. Further refining the data synthesis process and incorporating post-processing methods to mitigate these issues is crucial for future development.

Related Publications

Explore these studies to deepen your understanding of the subject.

Computer Science

A framework for the emergence and analysis of language in social learning agents

T. J. Wieczorek, T. Tchumatchenko, et al.

Engineering and Technology

A robust synthetic data generation framework for machine learning in high-resolution transmission electron microscopy (HRTEM)

L. R. Dacosta, K. Sytwu, et al.

Psychology

Deconstructing heterogeneity in schizophrenia through language: a semi-automated linguistic analysis and data-driven clustering approach

V. Bambini, F. Frau, et al.

Interdisciplinary Studies

A framework for developing team science expertise using a reflective-reflexive design method (R2DM)

G. R. Lotrecchiano, L. M. Bennett, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny