Medicine and Health

Enabling precision rehabilitation interventions using wearable sensors and machine learning to track motor recovery

C. Adans-dester, N. Hankov, et al.

This innovative research by Catherine Adans-Dester and colleagues presents a machine learning approach utilizing wearable sensors to assess upper-limb motor impairment and movement quality in stroke and TBI survivors. With impressive correlations between sensor data and clinician assessments, this method offers promising potential for personalized rehabilitation strategies.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses the challenge of high inter-individual variability in motor recovery after neurological injuries such as stroke and traumatic brain injury, motivating the need for precision rehabilitation. The research question is whether wearable sensors combined with machine learning can accurately estimate standard clinical outcome scores—capturing impairment (Fugl-Meyer Assessment, FMA) and movement quality (Functional Ability Scale, FAS)—to enable continuous tracking of motor recovery with minimal burden. Context includes the growing prevalence of disability in aging populations, the importance of monitoring recovery across ICF domains, and the impracticality of frequent clinician-administered assessments. The purpose is to develop and validate algorithms that estimate clinical scores from sensor data collected during functional tasks and to demonstrate their potential for tracking and predicting recovery trajectories to guide individualized interventions.

Literature Review

Prior work shows rehabilitation reduces disability across neurological conditions, but selecting effective interventions is challenging due to heterogeneous responses. Precision rehabilitation efforts have explored genotype- and motor phenotype-informed approaches. The International Classification of Functioning (ICF) framework guides outcome assessment, but clinical scales are time-consuming and infrequently administered, limiting longitudinal monitoring. Wearable sensors offer continuous, real-world data collection in home and community settings and have been used to assess upper-limb function and movement quality. Previous work by the authors demonstrated accurate estimation of FAS (movement quality) from wearable data, but earlier attempts to estimate FMA (impairment) were inadequate. Random forest methods are robust to overfitting in small datasets and have been applied to similar problems, motivating their use here. The study builds on these findings to improve FMA estimation accuracy from wearable data collected during Wolf Motor Function Test tasks.

Methodology

Design: Prospective longitudinal study with two visits (baseline and 3 months). Participants: 37 adults (16 stroke, 21 TBI) aged 18–80 undergoing inpatient or outpatient upper-limb rehabilitation, with moderate-to-severe impairment (FMA upper-limb 15–55/66). Exclusions: Mini-Mental State Examination <24 or inability to follow a three-step command. IRB approval obtained; informed consent provided. Clinical measures: Upper-limb Fugl-Meyer Assessment (FMA, 0–66) for impairment and Functional Ability Scale (FAS, 0–5 per item; total derived) for movement quality during Wolf Motor Function Test (WMFT) items. Sessions were standardized and video-recorded. Wearable sensing: Six Shimmer2 units with accelerometers were placed bilaterally on the upper limbs: desk (sternum height), mid-biceps (frontal), and dorsal wrists; plus two-axis accelerometers on thumb and index finger of the affected limb. Sensors were synchronized and programmed via dedicated software. Tasks: Eight WMFT tasks sampling gross and fine motor control were performed up to three times each; examples include forearm-to-table (WMFT-1), extending elbow at side (WMFT-4), lifting a pencil (WMFT-10), flipping cards (WMFT-13), and turning a key in a lock (WMFT-15). Each occurrence was timed and scored (FAS) by therapists. Statistical analyses: Group comparisons (stroke vs TBI) used chi-square tests for categorical variables and independent t-tests for continuous variables (SPSS v23). Normality checks conducted; alpha set at 0.05 with Holm correction for multiple comparisons. FAS total scoring: Average FAS per task across trials was computed for the eight tasks; an equation (as specified in the manuscript) was used to derive a total FAS score in the 0–5 range. Sensor data processing: Accelerometer data were segmented to task intervals using digital markers. Signals were low-pass filtered at 8 Hz and high-pass filtered at 0.25 Hz (sixth-order Butterworth). Magnitude time series for displacement, velocity, acceleration, and jerk were computed by combining axes. Features extracted included: min/max/mean, RMS, dominant frequency-to-energy ratio, jerk metrics, skewness, entropy, kurtosis, cross-axis correlation coefficients (to capture compensatory movements), and task duration. Feature selection used a correlation-based algorithm. Modeling used leave-one-subject-out cross-validation. Estimation pipelines: - FAS estimation: Previously developed algorithm from the authors estimated FAS from sensor features. - FMA estimation (four methods): 1) Linear regression (FAS): Fit linear model relating clinician-rated FAS to FMA (r²=0.75), then input sensor-estimated FAS to predict FMA. 2) Random forest (RF): For each of the eight tasks, RF regression with 100 trees predicted FMA from selected features; a second-stage RF (50 trees) aggregated task-level estimates to yield total FMA. 3) Balanced RF: As in method 2 but with balanced training across FMA score classes when building trees to mitigate nonuniform score distribution. 4) Proposed technique: Balanced RF augmented with sensor-derived FAS estimates as an additional input feature to further improve FMA prediction. Reliability benchmarks: Inter-rater reliability and typical clinical change magnitudes for FMA and FAS were considered to contextualize estimation accuracy. The potential to reduce variance via repeated measures averaging was discussed.

Key Findings

Participants: 37 total (16 stroke, 21 TBI) with residual upper-limb impairments. Groups were similar in clinical measures at baseline and post-treatment; age differed significantly (t(35)=3.365, p<0.01). Datasets were combined for modeling. FMA estimation accuracy (Table 2): - Method 1 (Linear regression using sensor-estimated FAS): RMSE 7.79 points; r² 0.47; bias −0.10 points. - Method 2 (Random forest, aggregated across 8 tasks): RMSE 5.05 points; correlation reported r≈0.77; task-level RMSEs ranged 6.17–10.21 using LOSO CV. - Method 3 (Balanced random forest): RMSE 4.17 points; r² 0.84; low bias (−0.15 points). Reduced dependence of error variability on FMA score classes. - Method 4 (Proposed technique: balanced RF + FAS as input): RMSE 3.99 points; r² 0.86; bias −0.21 points. Error distributions across FMA classes improved significantly in several classes (paired t-tests). FAS estimation accuracy: RMSE 0.38 points; r² 0.79; bias −0.15 points using the team’s prior algorithm. Overall, sensor-derived estimates closely matched clinician-generated scores, with the proposed FMA model achieving strong agreement (r²=0.86) and FAS estimates showing high accuracy (r²=0.79). Error analyses indicated slight overestimation for less impaired subjects and slight underestimation for the most impaired, with improvements after balancing and inclusion of FAS as a feature.

Discussion

The findings demonstrate that wearable sensor data analyzed with machine learning can accurately estimate clinical measures of upper-limb impairment (FMA) and movement quality (FAS). This addresses the need for low-burden, longitudinal assessment tools in rehabilitation, enabling continuous tracking of recovery trajectories across ICF domains. The improved FMA estimation—moving from inadequate prior approaches to a balanced RF leveraging sensor-derived FAS—provides clinically meaningful accuracy relative to known inter-rater reliability and typical change magnitudes. The approach supports precision rehabilitation by allowing clinicians to monitor responsiveness, adjust interventions in near real time, and potentially predict future outcomes using time-series modeling (e.g., Gaussian processes) that incorporate patient phenotypes. Robustness to small, imbalanced datasets was enhanced via RF and training-set balancing; adding FAS as an input further improved mapping from sensor features to impairment scores. The results suggest feasibility for home and community deployments to capture functional performance where it matters most.

Conclusion

This study presents a wearable sensor and machine learning framework that estimates key clinical scores for upper-limb function. The proposed balanced random forest model augmented with sensor-derived FAS achieved high accuracy for FMA (RMSE 3.99; r² 0.86), while FAS estimation achieved RMSE 0.38 and r² 0.79. These results indicate that sensor-based, low-burden assessments can closely approximate clinician ratings and can be used to track motor recovery trajectories and inform precision rehabilitation. Future work should: expand datasets to improve class balance and generalizability; reduce estimation bias across impairment levels; leverage repeated measures to lower variance; automate segmentation and data processing; and implement time-series predictive models (e.g., Gaussian processes) to forecast patient-specific responses and guide adaptive interventions in real-world settings.

Limitations

Key limitations include a relatively small sample size (n=37) and nonuniform distribution of FMA scores across classes, which can introduce bias and limit generalizability. The dataset encompassed both stroke and TBI populations, which may have distinct recovery patterns despite similar baseline/post differences; combining them could mask condition-specific nuances. Estimation errors showed slight class-dependent bias (overestimation in less impaired, underestimation in most impaired). The approach was validated on selected WMFT tasks in controlled sessions; performance in fully free-living environments and across broader task repertoires requires further validation. Some reporting inconsistencies (e.g., r vs r² in intermediate results) and reliance on leave-one-subject-out CV warrant confirmation in larger, independent cohorts. Mapping of sensor-derived features to clinical constructs depends on accurate task segmentation and feature selection, which currently involve processing steps that would benefit from full automation.

Related Publications

Explore these studies to deepen your understanding of the subject.

Psychology

Using machine learning to understand social isolation and loneliness in schizophrenia, bipolar disorder, and the community

S. J. Abplanalp, M. F. Green, et al.

Medicine and Health

Machine-learning algorithms for asthma, COPD, and lung cancer risk assessment using circulating microbial extracellular vesicle data and their application to assess dietary effects

A. Mcdowell, J. Kang, et al.

Medicine and Health

Passive detection of COVID-19 with wearable sensors and explainable machine learning algorithms

M. Gadaleta, J. M. Radin, et al.

Earth Sciences

Exploring multiyear-to-decadal North Atlantic sea level predictability and prediction using machine learning

Q. Gu, L. Zhang, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny