Medicine and Health
Intelligent wearable allows out-of-the-lab tracking of developing motor abilities in infants
M. Airaksinen, A. Gallen, et al.
The paper addresses the lack of objective, scalable tools for early neurological assessment of infants. Traditional milestone-based assessments and clinician-administered tests are useful but can miss the richness and variability of spontaneous motor behavior and often rely on subjective judgment in controlled settings, limiting ecological validity. Recent sensor technology makes extended out-of-hospital monitoring feasible with accuracy that can rival human observers. The study’s goal is to construct and validate a generalizable and scalable method to quantify infants’ spontaneous body activity across the full developmental sequence (from supine to independent walking), using a wearable system and an interpretable motor ability description scheme, and to derive an overall motor maturity index (BIMS) for individual neurodevelopmental tracking.
The authors note widespread use of milestone assessments for developmental screening but emphasize that these do not capture spontaneous motor behavior and are not sensitive to natural variability. Standard developmental assessments (e.g., clinician-observed tasks) can be subjective and conducted in controlled environments, limiting generalizability. Prior sensor-based work often used one or two accelerometers and constrained settings, limiting recognition of diverse motor abilities. The paper situates its contribution within advances in wearable sensing and machine learning, arguing for high temporal resolution, ecologically valid, interpretable measures that align with observable motor phenomena and can support clinical benchmarks like AIMS.
Study design: Observational methodological development study with primarily cross-sectional cohort. Wearable recordings (MAIJU jumpsuit) of infants’ spontaneous play were performed in home and lab environments; subsets were video-synchronized for human annotation and validation. Ethical approvals were obtained; informed consent was collected. Participants and recordings: N = 59 infants (approx. age range 4.5–19.5 months; abstract states 5–19 months). Total recordings N = 64, conducted at home (N = 40) or research facility (N = 24). Recording durations ranged 18–199 minutes (mean ≈ 67 min). A subset of recordings (N ≈ 41; total ≈ 1758 min noted; 2–3 h combined annotated data reported as 91,449 frames) had synchronized video for second-level human annotations of posture and movement. Wearable device (MAIJU): A snug jumpsuit with four waterproof inertial sensors (Movesense, Suunto, Finland) placed proximally on each limb in laminated pockets, sampling tri-axial accelerometer (m/s²) and gyroscope (deg/s) at 52 Hz, transmitting via Bluetooth LE to a mobile logger. Garment fabric (polyamide/elastane) tolerates laundry; designed for comfort and safety. Motor ability description and annotation: A phenomenological scheme captured two parallel tracks per second: five primary postures (e.g., supine, prone, sitting, crawling, standing; also side-lying intermediates) and movement qualities (still, proto, elementary, fluent; with intermediates like pivoting, side-lying left/right, and transitions). Five trained researchers independently annotated second-level labels on video-synchronized data to establish ground truth and inter-rater reliability. Annotation reliability: Postures showed very high inter-rater agreement (Fleiss’ kappa k = 0.95; prone k = 0.97, supine k = 0.97, standing k = 0.98, sitting k = 0.95; crawl k = 0.88; side-lying k = 0.79). Movement qualities had moderate overall agreement (k = 0.60), varying by class (still k = 0.67, fluent k = 0.71, transitions k = 0.51, elementary k = 0.24), with confusions mainly among conceptually adjacent categories. Self-supervised learning (CPC): Contrastive predictive coding trained on ~12 h of unannotated MAIJU data to learn latent representations; t-SNE visualizations showed clear clustering for posture categories and looser clustering for movement qualities, supporting the presence of these classes in the sensor data. Supervised classifier: End-to-end CNN taking 2.3 s windows (120 samples) with 50% overlap; preprocessing included gyroscope bias removal, linear interpolation to 52 Hz grid, seven-tap median filter. Architecture: encoder producing a frame-level fused latent (160-D) and a temporal classifier module outputting softmax probabilities for posture and movement (carrying handled by a separate detector). Training used ADAM (batch 100 consecutive frames, lr 1e-4, beta1 0.999, beta2 0.999, epsilon 1e-8), weighted categorical cross-entropy (inverse class frequency), sample dropout p = 0.3 and sensor dropout p = 0.3, trained for 200 epochs; 20% of training data used for validation to select best model by unweighted average F1. Tenfold cross-validation at recording level for annotated data; combined annotated dataset ~91,449 frames. Inter-rater-informed relabeling (LRA) was used to mitigate inconsistencies in contested frames. Carrying detection (ACD): A binary frame-level classifier trained to detect active carrying to filter out externally caused movements prior to motor ability analysis. Dataset: 17 home recordings totaling ~17 h with five annotated categories collapsed to binary active carrying vs. non-carrying. Architecture matched motor ability classifier. Leave-one-subject-out CV: overall accuracy 97.2%; active carrying recall 54.5%, precision 58.1%; non-carrying recall 98.7%, precision 98.5%. Performance evaluation: Frame-level compounded confusion matrices versus human annotations; recording-level agreement assessed by Pearson correlations between annotated and predicted category proportions and Bland–Altman analyses, with tests for bias (two-tailed t-tests) and error dispersion (±2 SD) relative to age-related monthly change rates. BIMS development: From typically developing infants (N ≈ 55–56; N = 60 recordings), age-dependent multivariate Gaussian models were fit to posture and postural-conditional movement distributions at 1-month resolution. For a new recording, maximum likelihood age estimation from these distributions yields a “motor ability age,” then rescaled to a [0–100] BIMS score where 0 and 100 map to the least and most advanced performance within the cohort age bounds. BIMS was compared with chronological age and with the clinician-assessed AIMS (same-day assessments) to evaluate clinical relevance. Statistics and reproducibility: Analyses in Matlab R2021a; classifier in Python 3.6.9/Tensorflow 1.12.0. Cross-validation at participant level; correlations via Pearson’s r (two-tailed), parental survey via Spearman’s rho; Bland–Altman and t-tests for bias; correlation comparisons via multiple established tests. No a priori sample size calculation (observational study).
- Feasibility: All wearable recordings were technically successful in both home and lab settings; infants moved freely with minimal disturbance.
- Annotation reliability: Very high inter-rater agreement for postures (overall Fleiss’ k = 0.95; prone k = 0.97, supine k = 0.97, standing k = 0.98, sitting k = 0.95; crawl k = 0.88; side-lying k = 0.79). Movement qualities showed moderate agreement overall (k = 0.60), higher for still (k = 0.67) and fluent (k = 0.71), lower for transitions (k = 0.51) and elementary (k = 0.24).
- Self-supervised validation: CPC-derived latent spaces showed clear separation for postures and relatively more overlap for movement qualities, aligning with human agreement levels and supporting that target classes are present in the wearable signals.
- Classifier performance: Algorithm achieved human-equivalent agreement for both posture and movement categories; average posture kappa algorithm vs. human ≈ 0.93. Confusions concentrated among conceptually related movement categories (e.g., fluent vs. elementary).
- Recording-level accuracy: Very high correlations between classifier and human for time proportions in key categories: prone posture r = 0.999 (p < 0.001), standing posture r = 0.999 (p < 0.001), fluent movement r = 0.96 (p < 0.001). Bland–Altman analyses indicated no systematic bias.
- Developmental trajectories: Annotated data showed strong age-related changes, e.g., decline in lying prone (r = -0.71, p < 0.001, N = 42) and stillness (rho = -0.62, p < 0.001, N = 42), increases in crawling and more advanced movement qualities with age.
- BIMS performance: Cross-validated motor ability age strongly correlated with chronological age (Pearson’s r = 0.89, p < 1e-20), with mean absolute error (MAE) ≈ 1.6 months (median error ≈ -0.3 months; IQR −1.2 to 0.9). Longer recordings improved accuracy (e.g., for recordings >120 min, MAE decreased from ~2.4 to ~1.9 months for lengths >1 h).
- Clinical linkage: BIMS correlated more strongly with AIMS than chronological age did (Pearson’s r increased from ~0.56 [age vs. AIMS] to ~0.83 [BIMS vs. AIMS]; Spearman’s rho from ~0.60 to ~0.82; N ≈ 28; p < 0.05).
- Parental estimates: Parent-reported time in postures showed overall correspondence but substantial scatter relative to MAIJU-derived distributions, reflecting day-to-day variability and subjective estimation limits.
- Carrying detection: ACD filtered approximately half of active carrying frames with very low false detections for non-carrying (recall 98.7%, precision 98.5%), enabling realistic preprocessing for autonomous analyses.
The study demonstrates that infants’ spontaneous motor abilities can be measured objectively and accurately outside clinical environments using a multi-sensor wearable and interpretable motor ability taxonomy. High inter-rater reliability for postures and human-equivalent classifier performance support replacing human annotators with the algorithm for large-scale analyses. The derived motor ability distributions track known developmental trajectories and enable computation of an overall maturity index (BIMS) that strongly aligns with age and relates more closely to clinical assessments (AIMS) than chronological age, suggesting added diagnostic value. The approach balances interpretability and performance by grounding automated analysis in physiologically reasoned, human-visible categories rather than black-box end-to-end diagnostic outputs. This provides transparent, temporally precise metrics suitable for individualized monitoring, benchmarking in clinical trials, and potential early identification of atypical development. The framework can be extended to context-specific analyses (e.g., asymmetries in crawling/walking) and integrated into growth chart–like references for motor development once larger normative datasets are collected.
The paper introduces and validates a wearable-based, deep learning–enabled system to quantify infants’ spontaneous motor abilities across the full milestone sequence with second-level resolution. It contributes a structured, interpretable motor ability description scheme, demonstrates human-equivalent automated classification, and proposes BIMS, a continuous maturity index that correlates strongly with age and established clinical scales. The solution is automatable and scalable for out-of-hospital use, promising objective benchmarking for individualized care and early intervention studies. Future work should include multi-center validation, larger cross-cultural normative datasets to establish reference values, longitudinal cohorts to assess sensitivity to change and clinical utility in early diagnostics, technical scaling and regulatory steps for clinical deployment, exploration of additional sensor modalities/placements, and expansion of descriptors beyond infancy.
- Recording environments varied (home vs. lab), potentially affecting behavior; however, home environments enhance ecological validity.
- Movement taxonomy emphasizes gross motor postures/qualities and 2.3 s frames; it does not capture intentionality or fine motor operations.
- Classifier performance for some movement categories is limited by inherent ambiguity and moderate human agreement (e.g., elementary, transitions), which may cap attainable accuracy.
- Age prediction saturates around ~16 months as descriptors reach ceiling, limiting utility of direct age estimates beyond this range (addressed by BIMS scaling).
- Single sensor type and placement configuration were used; alternative modalities/placements might capture additional aspects of movement.
- Predominantly cross-sectional design; longitudinal sensitivity to within-child change remains to be established.
- Generalizability requires larger, diverse, cross-cultural normative datasets and multi-center validation.
- Study cohort sizes for some analyses (e.g., AIMS comparisons, parental surveys) were modest.
Related Publications
Explore these studies to deepen your understanding of the subject.

