logo
ResearchBunny Logo
Machine Learning in the Parkinson’s disease smartwatch (PADS) dataset

Medicine and Health

Machine Learning in the Parkinson’s disease smartwatch (PADS) dataset

J. Varghese, A. Brenner, et al.

Discover groundbreaking research on Parkinson's Disease with the PADS dataset, derived from a three-year study involving 504 participants. This collection, integrating multimodal smartphone apps and smartwatches, harnesses machine learning for impressive accuracy in distinguishing Parkinson's disease from healthy controls and differential diagnoses. Join authors Julian Varghese, Alexander Brenner, Michael Fujarski, Catharina Marie van Alen, Lucas Plagwitz, and Tobias Warnecke in exploring this revolutionary resource for movement disorder research.

00:00
00:00
Playback language: English
Introduction
Parkinson's disease (PD) is a prevalent neurodegenerative disorder causing a significant health burden and impacting quality of life. Current diagnosis relies primarily on clinical examination, complemented by imaging techniques. The heterogeneity of PD symptoms and progression presents challenges for early diagnosis and treatment. Technology-based systems, utilizing digital biomarkers from various data modalities (hand movements, gait, voice, etc.), offer the potential for objective assessment and improved diagnostic accuracy. Previous studies demonstrated promising PD classification from various sources but often lacked a comprehensive dataset encompassing PD, DD, and HC, with sufficient sample sizes and balanced data representation, and interactive assessments to provoke subtle movement pathologies. The Smart Device System (SDS), combining smartwatches and a smartphone app, was developed to address these limitations. The SDS incorporates an interactive assessment designed by neurologists, electronic questionnaires, and simultaneous smartwatch recordings from both wrists. This study leverages the SDS to create the PADS dataset, aiming to advance research and improve diagnostic capabilities for PD and related movement disorders.
Literature Review
Existing literature highlights the potential of smart devices in PD research. Studies have shown high accuracy in distinguishing PD from healthy controls (HC) using data from hand movements, gait, balance, eye movements, and voice. However, these studies often suffered from limitations including unbalanced datasets, limited representation of DD, passive monitoring approaches, and inconsistent assessment methodologies. While some studies included DD, sample sizes were often insufficient for robust analysis. The lack of a large, comprehensive, and well-annotated dataset represents a significant hurdle to developing reliable ML models for PD diagnosis and monitoring in real-world settings.
Methodology
A three-year cross-sectional study was conducted at a large tertiary care hospital, involving participants with PD, DD (including essential tremor, atypical Parkinsonism, multiple system atrophy, etc.), and HC. The SDS, a multimodal system comprising a smartphone app and two smartwatches, was used to collect data. The app guided participants through an interactive 15-minute assessment consisting of 11 neurological movement steps. Smartwatches simultaneously recorded wrist movements, while the app collected demographic data, medical history, and self-reported non-motor symptoms (NMS). Data preprocessing involved age-matching to mitigate age-related confounding effects. A nested cross-validation (CV) approach was used to train and evaluate various ML models, including classical approaches with manually defined features (SVM, NN, CatBoost), automatic feature extraction using the Bag-of-Symbolic-Fourier-Approximation-Symbols (BOSSA) algorithm, and deep learning (DL) with XceptionTime. Two classification tasks were performed: (1) PD vs. HC and (2) PD vs. DD. Classifier stacking combined smartwatch and questionnaire data to improve classification performance. Feature importance analysis, using grouped permutation importance, was conducted to understand the contribution of different features to the model's accuracy. Additionally, analysis was performed on a gender-matched subset of the data to assess the impact of gender imbalance.
Key Findings
After age-matching, the study included 469 participants. The ML models achieved high performance: 91.16% balanced accuracy for PD vs. HC and 72.42% for PD vs. DD using classifier stacking. Individual model performances varied across different approaches and classification tasks. The BOSSA algorithm generally outperformed other methods. Analysis of feature importance revealed that both smartwatch data (particularly 'Kinetic tasks' for PD vs. DD) and questionnaire data ('Sleep/Activity' for PD vs. HC) contributed significantly to classification accuracy. Combining both data modalities through classifier stacking consistently improved performance compared to using either data source alone, particularly for PD vs. DD. Analysis of a gender-matched subset showed similar trends, indicating that the results were not significantly biased by gender imbalance. The average balanced accuracy for PD vs. HC in the gender-matched subset was 89.25%, and for PD vs. DD it was 69.56%.
Discussion
The study demonstrates the potential of a smart device-based system combined with advanced ML for accurate classification of PD, particularly when distinguishing PD from other movement disorders. The high accuracy achieved, especially in the PD vs. HC classification, suggests the feasibility of using such a system for diagnostic purposes. While the accuracy for PD vs. DD is lower, the system still shows promise for assisting clinicians in differentiating between similar movement disorders. The combined use of smartwatch data and NMS questionnaires proved advantageous. The study's limitations, such as the one-time in-clinic assessment and potential for data quality variation in a home-based setting, highlight the need for future research focusing on home-based data collection and automatic quality control.
Conclusion
The PADS dataset, with its size and detailed annotations, represents a significant contribution to the field of movement disorder research. The high classification accuracy achieved using ML models further underscores the potential of smart device-based systems for assisting in PD diagnosis and monitoring. Future research should address the limitations identified in this study, including exploring the use of the system in a home-based setting, investigating longitudinal data to analyze disease progression, and incorporating additional data modalities for improved accuracy and understanding of PD.
Limitations
The study's limitations include the one-time in-clinic assessment, which may not fully capture the variability of PD symptoms. The data quality in this study benefited from the supervision of a study nurse; in a less controlled home-based setting, data quality might be compromised. The gender imbalance, although addressed through sample weighting and gender-matched subset analysis, could still influence results. The relatively high standard deviation in some evaluation metrics also suggests that further research is needed to address potential overfitting and enhance model robustness.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny