logo
ResearchBunny Logo
Curriculum-Driven Edubot: A Framework for Developing Language Learning Chatbots through Synthesizing Conversational Data

Education

Curriculum-Driven Edubot: A Framework for Developing Language Learning Chatbots through Synthesizing Conversational Data

Y. U. Li, S. Qu, et al.

Explore the groundbreaking Curriculum-Driven EduBot framework, developed by Y U Li, Shang Qu, Zhou Yu, Yu Li, Jili Shen, and Shangchao Min. This innovative tool utilizes large language models to enhance language learning through engaging, curriculum-aligned dialogues, surpassing ChatGPT in effective user adaptation.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses the challenge that generic chatbots are not organized around curricula and may produce language or content mismatched to learners’ proficiency, potentially hindering progress. The authors propose Curriculum-Driven EduBot, a curriculum-aligned conversational agent that focuses on textbook-derived topics and vocabulary while adapting to the user’s English level. The goal is to merge the interactivity of chatbots with the structured progression of English textbooks, enabling coherent, user-tailored dialogues that foster conversational skills. The study investigates whether synthesizing curriculum-grounded dialogues and fine-tuning an open-source LLM can yield a chatbot that better supports language learning than a general-purpose system like ChatGPT.
Literature Review
Prior work shows AI and chatbots can support education by enabling personalized learning, engagement, and assessment. LLM advances have amplified chatbots’ potential in educational contexts. Curriculum-aligned approaches in language learning emphasize consistency with course materials while retaining adaptability, and prior studies explore integrating curricular content into digital tools, gamification aligned to milestones, and lexically constrained decoding to encourage target vocabulary use. Synthetic data generation with pre-trained language models (PLMs) has been used to augment conversational datasets, including in privacy-sensitive domains, with works showing topic-grounded and persona-controlled dialogue synthesis. Building on these, the authors synthesize curriculum-based dialogues with controlled personas, topics, and lexical choices, then fine-tune an open-source LLM to create a user-adaptive, curriculum-aligned chatbot.
Methodology
Framework: The development process has two main parts: (1) synthesize human–human dialogues grounded in a textbook curriculum using ChatGPT and (2) fine-tune an open-source LLM (Vicuna-13B) on the synthesized data to build EduBot. Conversational Data Augmentation: - Topic augmentation (Sec. 3.1.1): Extract primary topics from each textbook unit, then prompt ChatGPT to generate n closely related subtopics per primary topic (e.g., expanding “The True Value of Education” to topics like “The importance of education in personal and professional development”). This broadens topic coverage while maintaining curricular relevance. - Persona creation (Sec. 3.1.2): Prompt ChatGPT to generate two personas for each dialogue: Person 1 (randomized demographic, socio-economic, cultural, MBTI, and personal experiences) and Person 2 (fixed as a student corresponding to the textbook’s typical user; here, a Chinese college student). The chatbot is trained to assume Person 1; Person 2 simulates the student user. This fixed-random persona pairing both tailors the chatbot to anticipate a student user and encourages student-relevant topics. - Dialogue composition (Sec. 3.1.3): For each (topic, persona) pair, instruct ChatGPT to produce a dialogue starting with Person 1, incorporating a set of 10 target vocabulary words sampled from the unit’s “new words” list into Person 1’s utterances, and leading with questions to guide the conversation. Dialogues reflect the personas’ backgrounds and focus on the augmented topic list. Model Fine-tuning (Sec. 3.2): - Base model: Vicuna-13B (open-source, instruction-tuned) selected for strong understanding and comparability to ChatGPT. - Training data: Synthesized dialogues across all textbook units. The model is trained with Person 1 as the chatbot side and Person 2 as the user side. - Prompt design: System prompt specifies the chatbot’s persona, the topic, and the CEFR level corresponding to the textbook to control language difficulty. Prompts encourage sharing anecdotes/facts/experiences related to the topic while maintaining CEFR-constrained language. - CEFR control: The textbook’s CEFR level is included in the system prompt during training to ensure outputs match learners’ proficiency (A1–C2 scale). - Implementation details: Dialogues are formatted to match Vicuna training turns; training for 3 epochs with learning rate 2e-5; batch size 1 per GPU with gradient accumulation of 16; trained on 8 A100 GPUs for approximately 3 hours. Deployment (Sec. 3.3): - At runtime, the student selects a textbook unit; EduBot is assigned a persona, a topic from the unit’s augmented list, and a random subset of target vocabulary words from that unit. The deployment prompt mirrors training but explicitly includes the sampled vocabulary words and CEFR target, ensuring topic focus, persona consistency, and appropriate difficulty while reinforcing new words. Curriculum Source (Sec. 4): - Textbook: “New College English” (3rd edition), Audiovisual Said Tutorial, Level 3 (advanced), with 8 units, each providing topics and a new-word list. - Data scale: For each primary topic, generate 10 associated topics; for each dialogue, include 10 target words from the unit’s vocabulary. Total synthesized dialogues: 7,687 across 8 units. Data Statistics (Sec. 5): - Dialogues per unit: 880–1,210 (avg ≈ 1,058.76). - Dialogue length: avg 11.77 utterances; avg 28.71 words per utterance. - Persona diversity: All 16 MBTI types represented; gender balanced; Person 2 nationality set to China/Chinese in ~8,000 of 8,470 persona descriptions, indicating strong adherence to instructions. - Target word distribution: Person 1 (the chatbot side) includes most of the 10 target words; in most dialogues, Person 1 mentions at least half. Distribution across turns shows intended concentration in Person 1’s utterances. - CEFR comparability: Using ChatGPT as an evaluator, CEFR levels for synthetic dialogues are comparable to those of textbook paragraphs, with synthetic dialogues slightly more challenging on average. Baselines and Prompting (Sec. 6): - Baseline: ChatGPT, prompted to converse about textbook topics with concise replies (1–2 sentences). Zero-shot Vicuna baseline omitted due to poor instruction following. Response-length constraints applied to improve fairness and conversational balance, noting occasional ChatGPT deviations when users request longer explanations. User Study (Sec. 6.2): - Participants: 24 valid participants (4 male, 20 female), ages ~19.3, 20 majors, varying English proficiency; all had taken the corresponding course within the past year. Each participant had two conversations with EduBot and two with ChatGPT (min 20 utterances per conversation) on Unit 1 or 2 topics; bots were anonymized and order randomized. Post-task questionnaire included 20 criteria across six categories (curriculum consistency, proficiency level, role identification, language quality, content quality, usefulness).
Key Findings
Overall performance: EduBot generally outperforms ChatGPT as a curriculum-aligned conversational partner and better supports conversational practice, while ChatGPT tends to provide longer, more elaborate content beneficial for review. Selected questionnaire outcomes (percent choosing EduBot vs. ChatGPT vs. Same): - Actively raised questions to guide conversation: 75.0% vs 4.2% vs 20.8. - Interactions felt natural/realistic (not overly formal): 62.5% vs 4.2% vs 33.3. - Responses concise and accurate: 50.0% vs 12.5% vs 37.5. - Mentioned topics not directly covered in course: 50.0% vs 16.7% vs 33.3. - Recognized user as a Chinese college student: 41.7% vs 29.2% vs 29.2. - Provided unique/personal perspectives: 45.8% vs 37.5% vs 16.7. - Used personal experiences to support opinions: 33.3% vs 16.7% vs 50.0. - Did not produce offensive or hurtful responses: 0.0% vs 8.3% vs 91.7 (indicating almost all judged both safe). - Useful for reviewing class: 16.7% vs 25.0% vs 58.3 (ChatGPT slightly preferred for review). - Would recommend to other students: 37.5% vs 16.7% vs 45.8. - Believe continued use improves English conversation skills: 25.0% vs 12.5% vs 62.5. Vocabulary and difficulty alignment: - In user study conversations, average target vocabulary coverage: EduBot 5.55 words vs ChatGPT 0.62. - Some students reported ChatGPT used many words they didn’t understand more often than EduBot (37.5% vs 20.8%). - Synthetic dialogues’ CEFR level is comparable to textbook passages, slightly more challenging, indicating successful difficulty control. Utterance lengths and engagement: - ChatGPT’s outputs averaged ~10 words longer than EduBot’s and sometimes exceeded 60 words, which is less natural for conversation. - Users tended to produce longer responses when conversing with EduBot, suggesting better engagement and practice opportunities. Persona and role identification: - EduBot more effectively assumed assigned personas and acknowledged the user’s student role, offering personal opinions and experiences consistent with personas, leading to more realistic, engaging dialogues. Participant proficiency effects: - Lower-proficiency students more often judged bots as the same; higher-proficiency students tended to prefer EduBot for recommendation and improving conversational skills.
Discussion
The findings support the hypothesis that a curriculum-driven chatbot, trained on synthesized dialogues aligned to textbook topics, target vocabulary, personas, and CEFR levels, better facilitates conversational practice than a general-purpose chatbot. EduBot’s guided-initiative style leads to more interactive, natural-feeling exchanges that align with course content and the user’s role and proficiency. The stronger inclusion of target vocabulary and adherence to CEFR difficulty help cement curricular learning objectives while encouraging active participation. While ChatGPT can provide detailed, longer expositions that some users find useful for review, this style can dampen user initiative and may exceed the target difficulty. The results indicate that curriculum alignment plus persona conditioning and difficulty control are effective strategies for building language-learning chatbots that genuinely foster conversational skills.
Conclusion
The paper introduces Curriculum-Driven EduBot, a framework that synthesizes curriculum-grounded dialogues (with controlled personas, topics, and vocabulary) and fine-tunes an open-source LLM to deliver a chatbot aligned to textbook content and learner proficiency. User studies show EduBot surpasses ChatGPT in guiding curriculum-relevant conversations, acknowledging the student’s role, maintaining natural style, and reinforcing target vocabulary, thereby better supporting conversational skill development. Future work includes expanding content coverage across curricula, integrating multimedia elements, providing real-time feedback and error correction, and improving data synthesis and filtering to reduce artifacts and hallucinations, further enhancing EduBot as a comprehensive learning companion.
Limitations
Two key limitations emerged: (1) EduBot occasionally produced unnatural meta-comments about its own emotions or actions due to such artifacts appearing in the synthetic training dialogues; (2) EduBot sometimes hallucinated incorrect assumptions about the user or context (e.g., presuming the user’s situation) influenced by similar issues in the generated training data. The authors plan to refine data synthesis and employ stricter post-processing to filter unnatural or hallucinated content.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny