Medicine and Health
Parkinson's Disease Gene Biomarkers Screened by the LASSO and SVM Algorithms
Y. Bao, L. Wang, et al.
Parkinson's disease (PD) is characterized by loss of dopaminergic neurons in the substantia nigra and abnormal aggregation of α-synuclein. Current treatments, such as levodopa and dopamine receptor agonists, alleviate motor symptoms but do not prevent neurodegeneration and can lead to dyskinesia and motor fluctuations with long-term use, impacting quality of life. Increasing evidence indicates key roles for innate and adaptive immune responses in PD pathogenesis. α-Synuclein can trigger immune responses; immunosuppressants and anti-TNF therapies have been associated with reduced PD risk in epidemiological studies, supporting a role for neuroinflammation. Immune cells including microglia, astrocytes, and infiltrating T cells contribute to neuroinflammation and neuronal loss. Given these insights, identifying robust genetic biomarkers linked to immune mechanisms could advance diagnosis and inform immunomodulatory therapies. Traditional hub gene selection from bioinformatics networks can be subjective; integrating machine learning (ML) methods such as LASSO and SVM-RFE may improve accuracy and reproducibility. This study aims to identify PD-related hub genes using combined LASSO and SVM-RFE, evaluate their diagnostic performance, and explore associations with immune cell infiltration.
Prior work highlights the role of the immune system in PD: α-synuclein-associated immune activation, microglial mediation of neuroinflammation, and T cell infiltration contributing to neuronal degeneration. Population-based and case-control studies report reduced PD risk with immunosuppressants (e.g., corticosteroids, IMPDH inhibitors) and anti-TNF therapy in inflammatory bowel disease, with similar findings in rheumatoid arthritis cohorts, implicating immunomodulation in PD risk. Bioinformatics approaches have been used to infer hub genes via cytoHubba or STRING, but selection criteria can be arbitrary. ML techniques, notably LASSO regression and SVM-RFE, have improved biomarker selection accuracy in other diseases (e.g., lung and pituitary tumors), though few studies have combined them for PD biomarker discovery. These findings motivate applying robust ML-based feature selection to PD transcriptomic data and examining immune infiltration patterns with methods like CIBERSORT.
Data source and preprocessing: GEO microarray datasets meeting criteria (Homo sapiens, array expression profile, substantia nigra samples, raw data available) were selected. Four datasets (GSE7621, GSE20141, GSE20333, GSE49036) from GPL570 and GPL201 platforms served as training, totaling 31 controls and 47 PD samples. GSE20164 (GPL96; 5 controls, 6 PD) was used for external validation. Probes were mapped to gene symbols using platform annotations, averaging multiple probes per gene. Data were log2-normalized. Batch effects were mitigated using the SVA package with ComBat; PCA visualized batches before and after correction. Differential expression: limma identified DEGs using adj. p < 0.05 and |log2FC| > 1. Heatmaps and volcano plots were produced with pheatmap and ggplot2. Enrichment analyses: GO (BP, CC, MF), KEGG, and Disease Ontology (DO) enrichment were performed using clusterProfiler, org.Hs.eg.db, DOSE, and limma. Adjusted p < 0.05 and q < 0.05 (Benjamini–Hochberg) were thresholds; ggplot2 visualized results. PPI network: STRING (Homo sapiens; interaction score > 0.15) generated the PPI network. Cytoscape (v3.9.1) with MCODE identified key modules (parameters: degree cutoff = 2, node score cutoff = 0.2, k-core = 4, max depth = 100). Machine learning feature selection: LASSO regression (glmnet in R; binomial family, alpha = 1, 10-fold cross-validation to select λ) and SVM-RFE (e1071, kernlab, caret; recursive feature elimination with cross-validation minimizing error) were applied to DEGs to identify candidate features. Overlap between LASSO- and SVM-RFE-selected genes was obtained via a Venn analysis to define hub genes. Expression of candidate biomarkers was examined in the validation dataset GSE20164. Diagnostic evaluation: ROC analyses computed AUCs in the validation dataset; AUC > 0.6 was considered acceptable. Immune infiltration: CIBERSORT estimated proportions of 22 immune cell types. Samples with CIBERSORT p < 0.05 were retained; cell types with zero values were excluded. corrplot and vioplot visualized cell proportions and group differences. Pearson correlations quantified associations among immune cells and between hub gene expression and immune cell fractions. Statistics: R 4.2.2 was used. Student's t-test for continuous variables and Mann–Whitney U test for categorical variables; p < 0.05 indicated significance.
- Data integration and DEGs: After batch correction, 27 DEGs (25 upregulated, 2 downregulated) were identified in PD substantia nigra versus controls.
- Enrichment: GO terms enriched included neurotransmitter transport, dopamine biosynthetic process, synapse organization, presynapse, synaptic vesicle, exocytic vesicle, neuron projection terminus, and transport vesicle. KEGG pathways included cocaine addiction (hsa05030), dopaminergic synapse (hsa04728), amphetamine addiction (hsa05031), alcoholism (hsa05034), synaptic vesicle cycle (hsa04721), serotonergic synapse (hsa04726), and tyrosine metabolism (hsa00350). DO enrichment implicated autonomic nervous system neoplasm, neuroblastoma, peripheral nervous system neoplasm, Parkinson's disease, and synucleinopathy.
- PPI network: STRING-based network comprised 25 nodes and 96 edges (isolated nodes hidden). MCODE identified two modules: Subcluster 1 (10 nodes, 41 edges; score 9.111) and Subcluster 2 (5 nodes, 8 edges; score 4).
- ML-selected biomarkers: LASSO selected 8 genes; SVM-RFE selected 6 genes. Overlap yielded four hub genes: AGTR1, GBE1, TPBG, HSPA6.
- External validation (GSE20164): AGTR1 and GBE1 were significantly lower in PD than controls (AGTR1 p = 0.014; GBE1 p = 0.002). TPBG was lower (p = 0.15, not significant). HSPA6 was higher (p = 0.28, not significant).
- Diagnostic performance (ROC, validation dataset): AGTR1 AUC = 0.933; GBE1 AUC = 0.967; TPBG AUC = 0.767; HSPA6 AUC = 0.633.
- Immune infiltration differences (CIBERSORT): PD showed decreased B cell memory (p = 0.035) and activated dendritic cells (p = 0.037), and increased M2 macrophages (p = 0.024) versus controls. Correlations among immune cells included B cell memory positively with activated DCs (r = 0.42) and negatively with naive B cells (r = −0.61); M2 macrophages negatively with macrophages M0 (r = −0.58) and positively with monocytes (r = 0.33); activated DCs negatively with naive B cells (r = −0.26).
- Gene–immune cell correlations: AGTR1 negatively correlated with monocytes (R = −0.53, p = 0.0017) and M2 macrophages (R = −0.46, p = 0.0073). GBE1 negatively correlated with T cells CD4 memory resting (R = −0.35, p = 0.046) and monocytes (R = −0.38, p = 0.029). TPBG negatively correlated with monocytes (R = −0.46, p = 0.007). HSPA6 negatively correlated with plasma cells (R = −0.45, p = 0.0089).
This study addressed the need for robust, reproducible biomarker discovery in PD by integrating multiple GEO substantia nigra microarray datasets, correcting batch effects, and applying complementary ML methods (LASSO and SVM-RFE). The four convergent hub genes (AGTR1, GBE1, TPBG, HSPA6) showed promising diagnostic value in an external dataset, with AGTR1 and GBE1 significantly downregulated in PD and all four achieving AUCs ≥ 0.633. Functional enrichment placed DEGs in neuronal and synaptic processes and catecholaminergic pathways central to PD, while DO terms highlighted links to neuro-oncologic and synucleinopathies, consistent with neuroimmune involvement. Biologically, AGTR1 mediates renin-angiotensin signaling implicated in oxidative stress and dopaminergic neuron vulnerability; pharmacologic modulation of this axis has shown benefits in PD-related motor features. GBE1 deficiency underlies APBD with neuroinflammation and accumulation of polyglucosan bodies, suggesting a potential link between glycogen metabolism, neuroinflammatory stress, and PD pathology. TPBG (WAIF1), a Wnt signaling regulator, has been implicated as a PD candidate gene and relates to pathways relevant to neuronal survival. HSPA6 encodes a stress-inducible HSP70 family protein; elevated HSPA6 transcription has been observed in PD peripheral cells, and its SN upregulation here supports a stress response component. Immune deconvolution indicated increased M2 macrophages and decreased B cell memory and activated dendritic cells in PD, highlighting complex innate and adaptive immune alterations. The observed negative correlations between hub genes and specific immune subsets (e.g., AGTR1 with monocytes/M2 macrophages; GBE1 with CD4 memory T cells; HSPA6 with plasma cells) suggest these genes may interface with or reflect immune infiltration dynamics. Together, these findings support the hypothesis that AGTR1, GBE1, TPBG, and HSPA6 are tied to PD pathophysiology and immune microenvironment changes, offering potential for diagnostics and therapeutic targeting.
By integrating transcriptomic datasets and applying LASSO and SVM-RFE, this study identified four PD-associated hub genes—AGTR1, GBE1, TPBG, and HSPA6—with supportive external validation and diagnostic performance. Immune profiling revealed increased M2 macrophage infiltration and decreased B cell memory and activated dendritic cells in PD, with significant correlations between hub gene expression and immune cell fractions. These results provide insights into neuroimmune mechanisms in PD and suggest candidate biomarkers and potential immunotherapeutic targets. Further experimental and clinical validation is needed to establish causality, mechanisms, and translational utility.
- Limited sample size and reliance on publicly available datasets may restrict generalizability; larger, independent cohorts are needed.
- External validation was performed on a single dataset; additional cross-platform and prospective validations are warranted.
- Observational bioinformatics and ML analyses cannot establish causality; functional in vivo and in vitro studies are required to elucidate mechanisms.
- Potential confounders (e.g., medication status, postmortem interval) may influence gene expression and immune estimates but were not fully controlled.
- CIBERSORT provides relative immune cell estimates from bulk tissue; single-cell validation would refine cellular resolution.
Related Publications
Explore these studies to deepen your understanding of the subject.

