
Medicine and Health
A wearable sensor and machine learning estimate step length in older adults and patients with neurological disorders
A. Zadka, N. Rabin, et al.
Discover how researchers developed machine-learning models that accurately estimate step length using data from a single lower-back inertial measurement unit. Conducted by a talented team including Assaf Zadka, Neta Rabin, and Jeffrey M. Hausdorff, this study showcases incredible precision in measuring steps, even for participants with neurological disorders.
~3 min • Beginner • English
Introduction
Step length, a key spatial-temporal gait parameter linked to pace and closely correlated with stride length and gait speed, typically declines with aging and neurological disorders. Alterations in step length predict clinically important outcomes including falls, cognitive decline, dementia, morbidity, mortality, and treatment response, making precise quantification valuable for diagnosis, prognosis, and monitoring disease progression. Conventional lab-based tools (camera systems and gait mats) are accurate but provide only snapshot assessments that may be biased by day-to-day factors and do not reflect real-world function. Wearable inertial measurement units (IMUs) can enable continuous assessment, but IMUs do not directly measure spatial parameters and require accurate estimation models. Existing IMU-based approaches for step length estimation (double integration, kinematic models, regressions) face drift, calibration, or generalizability challenges. Prior machine learning studies often used small datasets of young healthy participants or required anthropometrics, limiting broad applicability. Given minimal clinically important differences (MCID) around 5 cm for step length (and 3.6 cm in Parkinson’s disease), the present study aims to develop a generalized, calibration-free regression model using a single lower-back IMU to accurately estimate step length during straight-line walking in a diverse cohort of older adults and individuals with neurological disorders, and to examine the trade-off between single-step accuracy and averaging over multiple steps.
Literature Review
Three principal IMU-based strategies for step length estimation have been used: (1) double integration of accelerometer signals with zero-velocity updates (effective mainly for foot-mounted sensors), (2) kinematic human gait models (often requiring calibration), and (3) regression and machine learning methods. Hybrid signal processing approaches (e.g., Kalman filtering with bidirectional integration) and models using multiple IMUs or unconventional placements have shown promise but can reduce user compliance. Early ML efforts using smartwatch data tested various algorithms (LR, GPR, SVM, RT, CNN, LSTM) but were limited to small, young, healthy cohorts, restricting generalizability. Byun et al. improved gait speed estimation in older adults (RMSE 6.81 cm/s) using a lower-back IMU plus anthropometric and demographic inputs; however, reliance on manual measurements reduces practicality. Hannink et al. estimated stride length accurately with a CNN (RMSE 6.09 cm) using ankle-mounted IMUs, but sensor location was unconventional for daily use. The Mobilise-D consortium validated stride length estimators during daily activities in mixed patient groups and older adults (absolute errors 15–33 cm), highlighting feasibility in real-world settings while leaving room to reduce errors. MCID for gait speed in various conditions (10–20 cm/s) implies a step length MCID ≈5 cm for typical step times; for PD, MCID ≈3.6 cm. These findings motivate a single-sensor, generalizable model with accuracy below MCID thresholds and robust across populations.
Methodology
Data sources and participants: A de-identified database was assembled from three projects. The primary dataset (V-TIME) included 257 participants with a history of ≥2 falls: 149 with Parkinson’s disease (PD), 27 with mild cognitive impairment (MCI), and 81 older adults (OA). Participants completed three 1-minute gait tests: (1) comfortable speed, (2) fast speed, and (3) dual-task (serial 3 subtraction while walking). Testing occurred at four time points (pre, post, 1 month, 6 months). Two independent validation datasets were used: ONPAR (n=113; 75 PD, 38 healthy adults; similar ages to V-TIME) and MS-Watch (n=102; 61 multiple sclerosis [MS], 41 healthy adults; younger cohort). In total, 83,569 steps were evaluated in V-TIME.
Instrumentation and reference: Participants wore a single lower-back Opal IMU (3D accelerometer and gyroscope; 128 Hz). The Zeno Walkway (7.92 m) provided gold-standard step length and gait speed.
Preprocessing and segmentation: Signals were low-pass filtered with an FIR filter (20 Hz cutoff). Steps were segmented using a vertical acceleration-based algorithm. Segmented IMU steps were synchronized to Zeno steps by minimizing timing differences.
Feature extraction/selection: From each step, features (detailed in Supplementary Information) were extracted. Stepwise feature selection retained 34 most informative features (e.g., FFT coefficients of acceleration, acceleration magnitude energy, and second integration of X and Y acceleration).
Modeling and validation: Traditional ML models (linear regression, regression tree, SVM, KNN) and XGBoost (gradient-boosted trees) were evaluated. An inverted pendulum biomechanical model served as a comparator. Fivefold cross-validation was performed on V-TIME with subject-wise splits (each participant assigned to training or validation within a fold). Model hyperparameters were tuned per fold (ranges in Supplementary Table 2). Gait speed was derived from estimated step length and step duration.
Modifications for deployment: (1) Averaging technique: post hoc averaging of estimated and reference step lengths over n consecutive steps (n=3,5,10) to reduce noise and improve accuracy. (2) Non-segmented model: training on fixed-length time windows (1 s and 5 s) to estimate distance/speed without explicit step segmentation, enabling real-time estimation in free-living settings.
Statistical analysis: Accuracy was assessed via RMSE (step length, gait speed), relative error (RA), and ICC(2,1). Agreement was examined with Bland–Altman analysis (limits of agreement, LOA). Goodness-of-fit was quantified with Pearson’s r and R². ANOVA tested the effect of averaging on RMSE (alpha=0.05).
Key Findings
Model selection: XGBoost achieved the best accuracy on the test set. Compared to an inverted pendulum model, XGBoost reduced step length RMSE from 20.60±0.77 cm (ICC 0.54±0.24) to 6.08±0.15 cm (ICC up to 0.91±0.003 across folds). Correlation and agreement: Pearson r=0.86; R²=0.71; Bland–Altman LOA for single steps: −10.84 to 13.20 cm, with a bias toward underestimating large steps and overestimating small steps.
Averaging improvement: Averaging consecutive steps improved accuracy: RMSE decreased to 5.21 cm (n=3), 4.98 cm (n=5), and 4.79 cm (n=10); ANOVA F=23.0, p=4.8×10⁻⁶. For 10-step averaging, LOA narrowed to −8.15 to 10.51 cm.
Group-wise performance (single step, test set): PD RMSE 6.64±0.25 cm (RA 9.59±0.48%, ICC 0.89±0.01); MCI RMSE 5.27±0.93 cm (RA 8.22±2.16%, ICC 0.77±0.15); OA RMSE 6.39±0.52 cm (RA 9.20±0.61%, ICC 0.90±0.02). PD vs OA RMSE difference was not statistically significant (t=1.76, p=0.12). MCI showed lowest mean RMSE but larger variability (SD) and lower ICC.
Condition-wise performance (single step, test set): Usual speed RMSE 5.70±0.25 cm (RA 8.50±0.37%); Fast speed RMSE 6.72±0.35 cm (RA 8.80±0.42%); Dual-task RMSE 6.26±0.25 cm (RA 10.65±0.45%). Averaging (n=3,5,10) further reduced RMSE and RA across conditions.
Non-segmented windows: Gait speed RMSE for models trained on fixed windows: 12.4 cm/s (1 s) and 11.8 cm/s (5 s), comparable to the step-based model’s derived gait speed RMSE (11.4 cm/s), suggesting feasibility without explicit step segmentation.
Generalizability: On independent validation datasets (ONPAR, MS-Watch), RMSEs were modestly higher than the test set, with RA relatively consistent across datasets. Validation set 2 (younger, longer step lengths) exhibited larger RMSE but comparable RA, indicating error scaling with step length magnitude. Overall, performance supported robustness across populations and datasets.
Discussion
Using a single lower-back IMU and XGBoost, the study achieved accurate step length estimation in older adults and individuals with neurological conditions. The model markedly outperformed a biomechanical inverted pendulum baseline and demonstrated strong correlation with the reference standard. A consistent bias was observed in Bland–Altman analyses: underestimation of long steps and overestimation of short steps. While this may limit precision for extreme values, it may be less problematic for progression biomarkers that rely on within-subject changes, particularly since very large steps are less common in neurological cohorts.
Averaging over multiple steps reduced RMSE below the 5 cm MCID for n≥5, reflecting noise reduction when predicting averaged targets. However, averaging sacrifices access to step-to-step variability, which can be clinically informative. Group-wise and condition-wise analyses revealed challenges in PD and at extreme walking speeds: PD showed the highest RMSE (though still modest), and fast walking (with longer steps) increased RMSE; dual-tasking increased RA, possibly due to irregular gait patterns. These findings suggest performance partly depends on speed and gait regularity, indicating potential benefits of speed-aware or two-stage models.
Across independent validation datasets, accuracy degraded modestly but RA remained comparable, supporting generalization; differences likely reflect cohort characteristics (age, step length distributions). Compared to recent literature, the present approach shows competitive or superior accuracy in controlled straight-line walking and uses a practical single-sensor placement conducive to compliance. Nonetheless, prior state-of-the-art studies have addressed more complex real-world trajectories; thus, future work should validate and extend the current model to free-living conditions and incorporate turning detection and more variable walking patterns.
The non-segmented approach achieved gait speed errors similar to the step-segmented method, indicating that step segmentation may be avoidable for some applications, simplifying real-time deployment.
Conclusion
A generalized, calibration-free step length estimator using a single lower-back IMU and XGBoost achieved high accuracy across diverse cohorts, including PD, MCI, MS, and healthy adults. Single-step RMSE was about 6 cm, and averaging over 5–10 steps reduced RMSE below 5 cm, meeting or exceeding clinically meaningful thresholds and supporting potential clinical and research use. The method improves practicality by relying on one wearable in a convenient location and, with non-segmented windowing, can be adapted for real-time applications. Future work should enhance accuracy for extreme step lengths, incorporate speed- or context-aware modeling, preserve step-to-step variability where needed, and rigorously validate performance in free-living, non-straight-line walking with turning detection, potentially enabling robust digital gait biomarkers in real-world settings.
Limitations
Data were collected in controlled, straight-line laboratory settings; performance in free-living, complex trajectories (including turning) remains to be validated. The model exhibits systematic bias for very short or long steps. Single-step RMSE exceeded the 3.6 cm PD-specific MCID, though averaging mitigated this. Group sample imbalances (e.g., smaller MCI cohort) contributed to variability and lower ICC in MCI. Some performance dependency on walking speed and regularity was observed, potentially constraining generalizability to extreme conditions. Validation datasets differed demographically (e.g., younger, longer step lengths), which may influence RMSE. Real-world deployment will require robust turning detection and evaluation in uncontrolled environments.
Related Publications
Explore these studies to deepen your understanding of the subject.