Education

Investigating lexical categorization in reading based on joint diagnostic and training approaches for language learners

B. Gagl and K. Gregorová

This fascinating study by Benjamin Gagl and Klara Gregorová explores how individualized diagnostics and training can boost visual word recognition in language learners, dramatically increasing reading speed. Through innovative machine learning techniques and the Lexical Categorization Model, they achieved impressive results—43% reading speed enhancement for trained learners. Dive into their groundbreaking findings!

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses how training lexical categorization, a process posited by the Lexical Categorization Model (LCM) to operate in the left ventral occipito-temporal cortex (visual word form area), can improve reading efficiency in slow readers, specifically adult non-native learners of German. Efficient visual word recognition supports reading comprehension and broader academic and societal outcomes. The research aims to (i) translate a transparent neuro-cognitive computational model (LCM) into a targeted training procedure, (ii) quantify individual lexical categorization abilities from task performance, and (iii) develop individualized machine learning diagnostics to predict who will benefit from training. The overarching hypothesis is that improving lexical categorization will increase reading speed and that LCM-derived features will be key predictors of training gains.

Literature Review

Prior work links visual word recognition efficiency with overall reading performance in typical readers, dyslexic readers, L2 learners, and beginners after grapheme–phoneme training. Neuroimaging highlights the visual word form area (VWFA) as critical for word recognition, with diminished activation in slow readers and illiterates. Existing models of reading include dual-route cognitive models and several neuro-cognitive proposals for VWFA function; however, many were descriptive without explicit computational implementations. Explicit model comparisons identified the LCM as best accounting for left ventral occipito-temporal activation patterns. The LCM proposes a lexical categorization process using word-likeness to distinguish words from non-words, with uncertainty peaking where word and non-word distributions overlap, paralleling behavioral difficulty. Training inspired by cognitive models (e.g., phonics) shows small-to-medium effects in dyslexia; this study adopts a similar model-to-training approach using LCM to target pre-lexical orthographic processing. The study also situates findings within predictive coding accounts and orthographic prediction error work suggesting adaptable visual-orthographic representations.

Methodology

Design: Three experiments with adult non-native learners of German assessed lexical categorization training effects on reading speed and evaluated individualized prediction of training gains using machine learning. Participants: 76 recruited; final analytical sample referenced as N=75 across diagnostics. Ages 17–74 (M=24.41, SD=6.89), diverse L1 backgrounds, no linguistic/neurological disorders. Inclusion criteria varied: Exp.1 restricted baseline SLS reading speed ≤16th percentile; Exp.2 unrestricted; Exp.3 restricted residence in Germany ≤2 years. Ethics approval obtained (Goethe University Frankfurt, Nr.: 2019-65). Tasks and stimuli: Core training was a lexical decision task with feedback (word vs. non-word), assumed to train lexical categorization per LCM. Each training approach used 1600 five-letter stimuli: 800 words (SUBTLEX-DE), 400 pseudowords (vowel substitutions), 400 consonant strings (vowel→consonant replacements). Word-likeness measured by OLD20; word frequency from SUBTLEX-DE; LCM-based lexical categorization uncertainty computed per item; orthographic prediction error available. Stimuli presented in Courier New, randomized order; responses within 10 s; three training sessions per week (each ~45–60 min). Control conditions: Exp.2 included phonics training (phoneme detection in simultaneously presented string) using same stimuli; Exp.3 included a font-change variant of lexical categorization (50 blocks of 32 items per font, font changes between blocks) to encourage adaptation of visual-orthographic representations. Assessment: Reading speed measured pre/post each training week with the adult Salzburger Lesescreening (SLS) sentence-level test (two versions; in Exp.2–3 split into shorter versions, 1:30 vs. original 3:00; scores adjusted for time). Outcome = percent increase in correctly processed sentences pre→post. Pre-training SLS accuracy and errors included as features. In Exp.1 post-assessment immediately after training; in Exp.2–3 on day 4 (one day after final session); training week coded. Experimental structure: Exp.1 piloted lexical categorization training. Exp.2 and Exp.3 used within-participant randomized controlled designs over two weeks: in one randomly assigned week participants did lexical categorization training; in the other week they completed phonics (Exp.2) or font-change lexical categorization (Exp.3); ≥14 days between weeks. Statistical analyses for training effects: Linear mixed models (LMMs) on log RTs included fixed effects for LCM lexical categorization uncertainty, OLD20, training across session(s), word frequency, lexicality, error status, and training week; random effects for participant and stimulus (and font where applicable). Training benefit tested via one-sample t-tests on SLS percent change. Machine learning diagnostics: Three-level pipeline with leave-one-out cross-validation (LOOCV) and consensus nested LOOCV for hyperparameter selection. - Feature extraction (Level 1): From first training session RTs and accuracies, LMMs estimated participant-specific random slopes for LCM uncertainty, OLD20, lexicality, word frequency, sequence index (within-session learning), errors, and interactions. Metadata (training week) and pre-training SLS metrics (score, errors) added. - Feature selection (Level 2): Stepwise regression performed both selection and generation of higher-order interaction features from Level-1 features. Multicollinearity tolerated for prediction focus. - Prediction models (Level 3): Evaluated multiple regression, support vector regression (linear, radial kernels), and random forest regression. Performance assessed by correlation between predicted and observed SLS percent change, t, and MSE. LOOCV trained on N−1 participants and tested on the held-out participant; inner LOOCV selected stable features/hyperparameters by consensus. Responder categorization: Using predicted continuous gains, a threshold optimized for sensitivity/specificity determined selection for training (≥13.5% predicted gain).

Key Findings

- Training efficacy: • Across three studies, lexical categorization (LC) training increased reading speed; a prior analysis reported a mean 23% improvement after three sessions. Experiment-wise analyses showed significant LC training gains and robust lexical categorization uncertainty effects on RTs (harder decisions at intermediate word-likeness). • Exp.2 phonics control did not significantly increase reading speed (t(26)=1.73, p=0.096) and showed no LCM uncertainty effect in RTs; RTs decreased with session. • Exp.3 font-change LC variant significantly increased reading speed; the gain was not significantly larger than standard LC (t(31)=-0.61, p=0.54), though descriptively 6–11% larger. • In all LC trainings, RTs increased with LCM-derived categorization uncertainty (Exp1/2/3 FE≈0.16/0.12/0.12; all ts≈9.8–11.3) and decreased across sessions (FE≈-0.03 to -0.07). The uncertainty-by-session interaction was significantly reduced in Exp.1 (FE=-0.21) and significant in the font-change training (FE=0.02). • About 30.26% of participants showed no improvement (training effect ≤ 0), indicating substantial interindividual variability. - Machine learning diagnostics: • Across all pipelines, correlations between predicted and observed gains ranged from -0.10 to 0.69 (mean≈0.42; mode just below 0.5). Aggregating medians across pipelines per participant yielded r=0.58 (t(73)=6.02, 95% CI 0.40–0.71, p<.001). • Best-performing pipeline (multiple regression; feature selection cutoff=10; key predictor structure modeling LCM uncertainty × OLD20 × sequence index within first session) achieved r=0.69 (t(73)=8.16, 95% CI 0.55–0.79, p<.001), R²=0.476. • Feature importance: Training week and incoming SLS were consistently selected; among word-level features, LCM lexical categorization uncertainty and related interactions (often with OLD20) were most frequently selected and had high relevance. Additional influential features included within-session learning (sequence index), errors, lexicality, and their interactions; word frequency was selected less often. - Responder selection: • With a 13.5% threshold, sensitivity=0.73, specificity=0.74, accuracy=0.73, precision=0.86 (N=44 selected of 75; true positive=38, false positive=6, true negative=17, false negative=14). • Selecting predicted responders increased mean group reading speed improvement from 23% (unselected group average) to 43% (machine-selected group; FE=0.199, SE=0.091, t≈2.19).

Discussion

Findings support the hypothesis that lexical categorization is a trainable and behaviorally consequential mechanism for efficient reading in adult L2 learners. LC training consistently improved reading speed and reduced decision difficulty effects, linking the LCM’s proposed VWFA function to measurable gains in sentence-level reading. The null result for phonics in this sample suggests task specificity: the same stimuli did not suffice to improve reading without engaging the targeted categorization process, likely because participants were not developmentally impaired but were L2 learners. The font-change LC variant suggests that adapting visual-orthographic predictions may additionally contribute to gains, though superiority over standard LC training was not statistically confirmed. The individualized diagnostics demonstrated medium-to-high predictive accuracy, with LCM-derived features—especially lexical categorization uncertainty and its interaction with word-likeness—being central to forecasting training gains. Incorporating within-session learning dynamics, error rates, training week, and baseline SLS further enhanced prediction. The selection model effectively enriched responders, nearly doubling the average training benefit at the group level. Together, these results validate a transparent model-to-training-to-diagnostics framework, showing that computationally grounded, explainable features can guide targeted interventions for slow readers.

Conclusion

The study presents a proof-of-concept framework that (i) translates a neuro-cognitive computational model (LCM) into an effective lexical categorization training, (ii) quantifies individual visual word recognition parameters from task data, and (iii) uses explainable machine learning to predict who benefits from training. LC training reliably improved reading speed in L2 learners, and LCM-based features were key to accurate individualized prediction, enabling selection of responders and improved average outcomes. Future work should validate predictions on independent samples, expand interpretable feature sets (visual, orthographic, phonological, lexical, semantic), enhance engagement (e.g., gamification), incorporate alternative outcome measures (computerized SLS, eye tracking), and extend the approach to multiple training options (e.g., phonics, font-adaptation) and populations (e.g., dyslexia, illiteracy).

Limitations

- Model validation relied on cross-validation without an independent hold-out sample, leaving residual risk of overfitting despite consensus nested LOOCV. - Sample size is modest for machine learning, potentially limiting complex model performance and generalizability. - Training week influenced outcomes, possibly reflecting motivation or fatigue effects; procedural variations (timing of post-test) may have affected measurements. - SLS irregularities (timing deviations, page order issues) required corrections; outcome based on paper-and-pencil SLS may be complemented by more granular measures. - Control training procedures had fewer participants relative to LC training for robust comparative diagnostics. - Generalization to other reader populations (e.g., dyslexic or illiterate adults) remains to be established.

Related Publications

Explore these studies to deepen your understanding of the subject.

Education

Place-Based Education and Heritage Education in in-service teacher training: research on teaching practices in secondary schools in Galicia (NW Spain)

T. Riveiro-rodríguez, A. Domínguez-almansa, et al.

Medicine and Health

Population Pharmacokinetic and Exposure–Response Analysis of Finerenone: Insights Based on Phase IIb Data and Simulations to Support Dose Selection for Pivotal Trials in Type 2 Diabetes with Chronic Kidney Disease

N. Snelder, R. Heinig, et al.

Environmental Studies and Forestry

A new scheme for low-carbon recycling of urban and rural organic waste based on carbon footprint assessment: A case study in China

K. Zhou, Y. Li, et al.

Medicine and Health

Effectiveness of app-based cognitive behavioral therapy for insomnia on preventing major depressive disorder in youth with insomnia and subclinical depression: A randomized clinical trial

S. Chen, J. Que, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny