Traditionally, language learners have relied on dictionaries as lexical tools. Research in the late 20th and early 21st centuries debated the optimal dictionary type (bilingual, monolingual, or bilingualized) and medium (print vs. digital). Lew (2004) found that bilingual dictionaries were crucial for receptive tasks across proficiency levels, a finding not always replicated due to variations in dictionary scope. However, even limited bilingual dictionaries often outperformed monolingual ones in production tasks. Learners tend to prefer bilingual dictionaries, seeking native-language equivalents even when paraphrases are offered in the classroom. While the digital medium has largely replaced print, the emergence of LLMs and AI-powered chatbots like ChatGPT presents a potential paradigm shift. This study investigates ChatGPT's effectiveness as a lexical tool compared to traditional bilingual and monolingual dictionaries in both receptive and productive tasks for English as an additional language.
Literature Review
Existing research highlights the ongoing debate surrounding the optimal type and format of dictionaries for language learners. Studies on bilingual versus monolingual dictionaries show mixed results, often depending on factors such as dictionary scope and task type. The transition from print to digital dictionaries has also been studied, with inconclusive evidence regarding superiority of either format. However, the rise of LLMs and their potential application in lexicography is a relatively new area of research. The current study builds upon this foundation by directly comparing a widely used chatbot (ChatGPT) with established dictionary resources.
Methodology
This study used a quantitative approach involving 166 Polish-speaking university students (Years 1-3, B2-C1 English proficiency). Participants were randomly assigned to one of three groups: ChatGPT, LDOCE (online), and Diki.pl (online bilingual Polish-English dictionary). They completed two paper-based tests: a production test (translating 20 Polish sentences into English using specific phrasal verbs) and a reception test (understanding and translating 20 English sentences containing target phrasal verbs into Polish). Each test had two versions to counterbalance item order. Participants received 15 minutes of instruction before a 90-minute testing period. Scoring involved multiple parameters: accurate use of target phrasal verbs (using regular expressions) and successful conveyance of meaning (regardless of target verb use). The latter was scored using ChatGPT-4 and validated with two human judges showing substantial agreement. Mixed binary logistic regression models were used to analyze the dichotomous success/failure data, controlling for factors such as year of study and tool used. Model selection involved likelihood-ratio tests and assessment of dispersion to ensure model convergence.
Key Findings
The results revealed significant differences in performance across the three tools. In production tasks focusing on accurate use of target phrasal verbs, ChatGPT significantly outperformed both the bilingual and monolingual dictionaries. The bilingual dictionary, Diki.pl, performed better than the monolingual LDOCE. When considering successful meaning conveyance, even without using the exact phrasal verb, ChatGPT again showed a substantial advantage over both dictionaries. In reception tasks, the bilingual dictionary, Diki.pl, showed the highest success rate in understanding the target phrasal verbs, followed by ChatGPT and then LDOCE. The difference between Diki.pl and LDOCE was statistically significant. Interestingly, Year 3 students (most advanced) showed improved performance with the monolingual dictionary compared to lower-year students, especially for the production task where specific phrasal verbs were needed. However, even the most advanced students using LDOCE significantly underperformed compared to students using ChatGPT or the bilingual dictionary for both production and meaning conveyance. The overall effect of the tool used was highly significant in both production and reception tasks.
Discussion
The findings demonstrate that ChatGPT provides a strong alternative to traditional dictionaries, particularly for productive tasks such as generating accurate English sentences incorporating specific phrasal verbs. The superior performance of ChatGPT in production aligns with its strength in generating idiomatic and natural English. However, its performance in reception tasks was less markedly superior to the bilingual dictionary, potentially due to ChatGPT's relative weakness in processing and understanding Polish (the participants' native language). The study confirms the advantage of bilingual dictionaries, especially in reception tasks, supporting Lew's (2004) findings. The results highlight the importance of considering both the native language and the target language when assessing lexical tools for learners. Future research should explore the impact of ChatGPT on long-term language learning and retention and the potential benefits of multilingual language models.
Conclusion
This study shows that ChatGPT is a viable alternative to traditional dictionaries, excelling in productive tasks. The bilingual dictionary shows advantages for reception, highlighting the role of the native language in comprehension. Future research should investigate the impact on language learning and the potential of multilingual language models.
Limitations
The study's focus on advanced learners (B2-C1) limits the generalizability of the findings to other proficiency levels. The use of Polish as the native language might influence the reception task results due to ChatGPT's limitations in Polish language processing. Further research with different native languages is needed. The specific phrasal verbs used could also have influenced the outcomes.
Related Publications
Explore these studies to deepen your understanding of the subject.