logo
ResearchBunny Logo
Introduction
Mental disorders represent a significant global health burden, with substantial economic costs and significant personal suffering. Accurate diagnosis is crucial for effective treatment, yet significant delays and misdiagnosis rates hinder progress. The Diagnostic and Statistical Manual of Mental Disorders (DSM) has improved diagnosis, but objective diagnostic tools remain lacking, particularly for individuals with multiple comorbid disorders or young children. Genetic factors play a significant role in the etiology of mental disorders, with structural variations in both coding and non-coding regions emerging as potential biomarkers. Machine learning, especially deep learning, has shown promise in complex disease classification. This study aimed to determine if a deep learning model could accurately differentiate African American individuals with mental disorders from controls and correctly classify patients with multiple diagnoses using whole genome sequencing data focusing on eight common mental disorders: ADHD, depression, anxiety, autism, intellectual disabilities, speech/language disorder, developmental delays, and oppositional defiant disorder (ODD). The high prevalence of these disorders and the challenges in their diagnosis and especially in young children justify the focus of this study. The use of an African American cohort addresses the scarcity of research on this population.
Literature Review
Existing literature highlights the substantial economic and social burden of mental disorders, emphasizing the need for improved diagnostic tools. Studies have demonstrated the association between structural variation in the genome, including non-coding regions, and mental disorders, pointing to potential therapeutic targets. Previous research has explored the application of machine learning in mental health, using various data types (clinical data, genetic data, vocal/visual expression, social media data). However, many studies focusing on genomic data are limited to single disorder types and lack representation from minority populations like African Americans. This study builds upon prior work by using a deep learning approach to simultaneously consider several mental disorders in an understudied ethnic group.
Methodology
This study utilized whole genome sequencing (WGS) data from 4179 African American individuals from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program, including 1384 patients diagnosed with at least one of eight specified mental disorders. Data on mental health status was extracted from de-identified electronic health records (EHRs) using ICD-9 and ICD-10 codes. WGS data processing involved variant call format file extraction from the TOPMED database, alignment, and removal of common variants. Genomic regions were divided into 587 pieces (~5Mbp/piece), and the occurrence of seven variant types (nonsynonymous SNVs, frameshift SNVs, SNVs in UTR, non-coding RNA SNVs, SNVs in intronic regions, SNVs in intergenic regions, and SNVs producing a stop codon) was calculated for each piece. These data formed the feature vectors for a multi-layer perceptron (MLP) deep learning model. A random forest algorithm was used for feature selection, reducing dimensionality by prioritizing relevant genomic pieces. Model parameters were optimized using a two-fold random shuffle test across 50 iterations to assess the model's performance. Two prediction models were used: one for binary classification (mental disorder vs. control) and another for multi-label classification (presence/absence of each of the eight disorders). The accuracy of the models was evaluated using accuracy, Hamming loss, and exact match rates.
Key Findings
The deep learning model demonstrated approximately 65% accuracy in distinguishing between African American individuals with mental disorders and controls. The performance in multi-label classification, using Hamming loss as a metric, was also encouraging, with a score of less than 0.3, indicating at least 70% of labels were correctly predicted. The exact match rate for multi-label classification was lower (7.2–9.3%), likely due to factors including limited sample sizes for some disorders and shared genetic risk between disorders. Notably, variants in non-coding regions (ncRNA, intronic, and intergenic regions) showed predictive performance comparable to coding region variants, but with more uniform weight distributions and a lack of genomic hotspots, suggesting their potential as alternative markers. Enrichment analysis of genomic regions with high weights revealed significant associations with immune responses, antigen/nucleic acid binding, chemokine signaling pathways, and G-protein receptor activities.
Discussion
The findings suggest that deep learning models, applied to whole genome sequencing data, can provide insights into the genetic architecture of mental disorders, even in the context of comorbidity. The relatively high accuracy of both binary and multi-label prediction models demonstrates the potential use of this approach in assisting clinicians. While the exact match rate in multi-label classification was lower than ideal, this is likely influenced by sample size limitations and inherent difficulties in classifying multiple overlapping disorders. The similar performance of non-coding variants compared to coding variants offers a new perspective on the role of non-coding regions in mental disorders, suggesting that further exploration is warranted. The identified enriched biological pathways provide valuable clues into potential mechanisms underpinning these disorders.
Conclusion
This study demonstrates the potential of applying deep learning to WGS data to improve the diagnosis and understanding of mental disorders in African Americans. The model's success in classifying both single and multiple diagnoses underscores the utility of this approach. The identification of non-coding variants as important biomarkers opens new avenues for research. Future studies with larger datasets and focusing on individual disorders are needed to validate these findings and further refine diagnostic tools. Further investigation of the identified biological pathways may reveal novel therapeutic targets.
Limitations
The study's limitations include the relatively small sample size for certain mental disorders, particularly ODD and autism. The reliance on ICD codes for diagnosis may also introduce some imprecision. The study is limited to African Americans, potentially affecting the generalizability of results to other populations. Finally, while the deep learning model performed relatively well, further research is needed to validate these findings and to explore potential confounding factors.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny