logo
ResearchBunny Logo
Dysgraphia detection through machine learning

Computer Science

Dysgraphia detection through machine learning

P. Drotár and M. Dobeš

This paper by Peter Drotár and Marek Dobeš investigates the innovative use of machine learning to detect dysgraphia, a challenging writing disorder. A unique handwriting dataset was used, with various features extracted, ultimately showcasing AdaBoost's impressive accuracy in predictions.... show more
Introduction

The paper addresses the problem of automatically identifying developmental dysgraphia—difficulties in written expression that can harm academic performance, development, and self-esteem. Early, objective, and scalable diagnosis is important for timely intervention and classroom accommodations. Dysgraphia can co-occur with dyslexia and developmental coordination disorder, manifests differently across ages, and is heterogeneous in its handwriting features. The study’s goal is to develop and evaluate a machine learning approach that, using digitized handwriting signals, distinguishes dysgraphic from typically developing handwriting across a heterogeneous cohort (ages 8–15, mixed sex and handedness). The work proposes a new data acquisition template, extracts a comprehensive set of spatiotemporal and kinematic features (including newly proposed ones), and evaluates multiple classifiers to test whether reliable discrimination is achievable and which tasks/features contribute most.

Literature Review

Prior work has leveraged digitizing tablets to capture dynamic handwriting features beyond static pen-and-paper assessments, including speed, acceleration, pressure, and in-air movement. Asselborn et al. categorized features as static, kinematic, pressure, and tilt; Mekyska et al. included kinematic and non-linear dynamic features; Rosenblum et al. considered temporal and product quality features. Common ML approaches include random forests (Asselborn; Mekyska for 8–9-year-olds), linear SVMs (Rosenblum & Dror; Sihwi et al., with smartphone input), and neural networks (simple NNs with few hidden neurons, Samodro & Sihwi; deep learning screening, Kariyawasam et al.). Some studies explored unsupervised methods (e.g., K-means with PCA) and advanced signal features (fractional-order derivatives; tunable Q-factor wavelet transform). Related ML applications include dyslexia detection using EEG-derived features. Methodological critiques in the field emphasize consistent hardware across participants and confirmed clinical diagnoses. This study contributes by using Slovak orthography, a consistent device, clinically assessed labels, and a broader, more heterogeneous cohort.

Methodology

Participants and Data Collection: 120 schoolchildren (ages 8–15; 80 boys, 40 girls; 57 dysgraphic, 63 typically developing). Dominant hand: 16 left-handed. Age distribution was approximately normal; no significant mean age difference between groups. Inclusion required clinically diagnosed dysgraphia; comorbid developmental disorders were not exclusionary. Three professionals independently assessed dysgraphia presence. Data from dysgraphic children were collected in 2018–2019 at a Centre for Special-Needs Education; controls were collected at elementary schools. Ethical approval was obtained (APVV-16-0211; University of Pavol Jozef Šafárik, Košice). Informed consent from parents/guardians was obtained. Acquisition Protocol: Writing performed on paper placed over a WACOM Intuos Pro Large tablet. Tasks: letter “l” at normal and fast speeds; syllable “le” at normal and fast speeds; word “leto”; pseudoword “lamoken”; complex word “hračkárstvo”; sentence “V lete bude teplo a sucho” (The weather in summer is hot and dry). The tablet captured 5 signals: x-position, y-position, pressure, azimuth, and altitude; it also recorded on-surface vs in-air movement. Feature Extraction: Focused on spatiotemporal and kinematic features (gold standard in handwriting evaluation). For vector signals (velocity, acceleration, jerk; vertical/horizontal components; pressure, altitude, azimuth for on-surface only) summary statistics were computed: mean, median, standard deviation, min, max, 5th and 95th percentiles. Segment-level features (segment defined as continuous on-surface movement between in-air transitions): duration, vertical/horizontal length, height, width with summary statistics (mean, median, SD, min, max). Scalar features: number of pen lifts, number of local extrema in velocity/acceleration, total duration, total path length, vertical/horizontal length. Novel features captured handwriting line inclination and vertical drift: differences between y-position statistics of early vs late segments (first vs last; second vs penultimate), and variance of segment y-position statistics (min/max/median/mean). For the sentence task, analogous features were also extracted from in-air movement (excluding pressure/altitude/azimuth). In total, 133 on-surface features per task and 112 additional in-air features (sentence) were extracted. All features were standardized (z-scored) per feature. Preliminary Visualization: PCA and t-SNE were used to visualize data structure. Both showed substantial overlap between groups, suggesting no simple linear separation and motivating non-linear classifiers. Feature Relevance Analysis: Weighted k-nearest neighbors feature selection (WkNN-FS) identified 150 of 1176 merged features as relevant (non-zero weight). Top-weighted features included: number of pen lifts, vertical length, maximum segment vertical length, minimum segment height, difference between maximum y-positions of second and penultimate segments, 5th percentile of acceleration, maximum segment length, total movement length, SD of segment height, altitude mean/median, difference between median y-positions of first and last segments, mean/SD of segment vertical length, and minimum pressure. These accounted for ~50% of total feature weights. For in-air (sentence) features, six were selected: acceleration median, 95th percentile of horizontal jerk, 5th percentile of acceleration, 5th percentile of jerk, 95th percentile of jerk, and 5th percentile of horizontal acceleration. Classification: Multiple binary classifiers were evaluated using scikit-learn; TPOT was also tested for pipeline optimization. Focused on non-linear classifiers: AdaBoost, Random Forest (RF), and Support Vector Machine (SVM with RBF kernel). Data were normalized within each train/test split. Stratified 10-fold cross-validation repeated 10 times; performance averaged. Hyperparameter tuning: AdaBoost number of estimators searched 20–500 (step 20), optimal 340; RF grid over n_estimators ∈ {20,…,500}, min_samples_split ∈ {2,4,6,8}, max_features ∈ {5,10,20,30,40}, optimal (60, 4, 5); SVM grid over C ∈ [2e−9,…,2e9], gamma ∈ [2e−9,…,2e9], optimal C=4, gamma=2e−9. Applying feature selection for classification slightly reduced accuracy, so the full feature set was used.

Key Findings
  • Overall performance (all tasks merged): AdaBoost achieved 79.5% ± 3 accuracy; SVM 78.8% ± 2; RF 77.6% ± 1. For AdaBoost, specificity/sensitivity were 76.7% ± 2 / 79.7% ± 5; for SVM 82.4% ± 4 / 74.5% ± 4; for RF 83.3% ± 2 / 71.4% ± 3.
  • Best single-task performance: The difficult word “hračkárstvo” yielded 76.2% accuracy with AdaBoost (SVM 72.5%, RF 72.3%), approaching the all-tasks result. Simpler or maximal-speed tasks (e.g., letter l, syllable le at max speed) had lower accuracy. Tasks “leto” and “lamoken” produced relatively higher accuracies for SVM and RF (~66–78%), while AdaBoost on those tasks was ~61–66%.
  • Feature relevance: WkNN-FS selected 150/1176 features as relevant; top ~15 features (including pen lifts, vertical length, segment vertical-length statistics, altitude mean/median, minimum pressure, and novel y-position difference/variance features) explained nearly 50% of total weight. Newly proposed segment-geometry and vertical-drift features were as influential as classic kinematic features.
  • In-air features: In-air features from the sentence improved interpretability; only six were deemed relevant by WkNN-FS. Adding in-air features to sentence on-surface features did not consistently improve accuracy across models.
  • Visualization: PCA and t-SNE revealed substantial class overlap, supporting the need for non-linear classifiers.
Discussion

The study demonstrates that data-driven, tablet-based assessment can objectively aid in distinguishing dysgraphic from typically developing handwriting. Ensemble (AdaBoost) and kernel (SVM) methods effectively model the non-linear patterns present, achieving nearly 80% accuracy on a heterogeneous cohort spanning ages 8–15, mixed sex and handedness. Feature analysis indicates that no single feature dominates; rather, clusters of intercorrelated static, kinematic/dynamic, and geometric/positional features contribute, consistent with prior studies. Newly introduced segment geometry and vertical drift features are highly informative and interpretable, linking model decisions to observable handwriting alterations. While a reduced relevant feature subset is useful for interpretability, maximal classification performance was achieved with the full feature set. Task selection matters: more demanding lexical items (e.g., “hračkárstvo”, “lamoken”) elicited clearer dysgraphic signatures than simple or maximal-speed primitives, suggesting future protocols can optimize task design for machine evaluation. The findings expand evidence across orthographies (Slovak) and address methodological critiques by using consistent hardware and clinically assessed labels, supporting the generalizability and clinical utility of ML-based dysgraphia screening pending larger, diverse datasets.

Conclusion

The paper contributes a new Slovak-orthography dataset, a comprehensive and partly novel feature set, and an evaluation showing AdaBoost-based classification of dysgraphia with ~80% accuracy in a heterogeneous sample. Several newly proposed, interpretable features (e.g., maximum segment vertical length, minimum segment height, inter-segment y-position differences) were among the most predictive. The approach is suitable for integration into tablet-based decision support tools to enable scalable, low-cost school screening and to assist clinicians with objective assessments. Future work should broaden cross-orthography validation, increase sample sizes, refine age-specific analyses to account for developmental changes in handwriting, and design optimized handwriting tasks to enhance machine discriminability. Further neuroscientific integration may clarify subtype-specific feature patterns and improve personalized interventions.

Limitations
  • Limited sample size (N=120), which may constrain model generalizability and contribute to variability across algorithms and tasks.
  • Single orthography (Slovak) limits cross-linguistic generalizability; applicability to other languages/orthographies remains to be validated.
  • Broad age range (8–15 years), during which handwriting is still developing; fewer cases in each age subgroup precluded detailed age-specific analysis.
  • Inclusion of comorbid developmental disorders may introduce heterogeneity; subtype-specific effects were not disentangled.
  • Feature importance patterns may be task- and cohort-sensitive; reduced feature subsets slightly decreased accuracy, indicating possible dependence on redundant or correlated features.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny