logo
ResearchBunny Logo
Explainable machine learning identifies multi-omics signatures of muscle response to spaceflight in mice

Space Sciences

Explainable machine learning identifies multi-omics signatures of muscle response to spaceflight in mice

K. Li, R. Desai, et al.

This groundbreaking study explores how spaceflight triggers muscle atrophy in mice by unraveling the mysteries of calcium dysregulation and SERCA pump malfunction. The innovative use of multi-omics data highlights Acyp1 and Rps7 as pivotal proteins linked to muscle resilience in microgravity. This research, conducted by Kevin Li and colleagues, offers crucial insights for mitigating muscle loss during space travel.

00:00
00:00
~3 min • Beginner • English
Introduction
Prolonged exposure to microgravity induces muscle atrophy, posing a major challenge for astronauts. Current countermeasures such as intensive exercise are time-consuming and insufficient to fully offset microgravity’s effects. Dysregulation of cytoplasmic Ca2+ due to altered SERCA pump-mediated reuptake has been proposed as a contributor to atrophy. The soleus (SOL; slow-twitch, oxidative) and tibialis anterior (TA; fast-twitch, glycolytic) muscles are both affected by spaceflight and show atrophy; Ca2+ uptake is impaired in SOL but enhanced in TA during spaceflight, suggesting muscle-type-specific SERCA alterations. The molecular mechanisms underlying these differences remain incompletely understood, and additional drivers of atrophy beyond SERCA may exist. To address this, the study leverages explainable machine learning (ML) on multi-omics data to map molecular changes to calcium reuptake and to classify flight (FLT) vs ground control (GC) samples in mouse SOL and TA muscles. Given the importance of interpretability and generalizability for biomedical insight, the authors employ QLattice symbolic regression, which yields concise, interpretable models less prone to overfitting in high-dimensional, low-sample-size settings. The goal is to identify biomarkers and molecular interactions that explain spaceflight-induced muscle physiology changes and inform countermeasures.
Literature Review
Prior work has documented microgravity-induced muscle atrophy and highlighted exercise as a partial but insufficient countermeasure. Studies have implicated disrupted calcium handling, particularly SERCA-mediated Ca2+ reuptake, in atrophy, with spaceflight differentially affecting SOL (impaired uptake) and TA (enhanced uptake). Both SOL and TA in mice exhibit atrophy after spaceflight. Machine learning approaches have proven effective for complex, high-dimensional multi-omics biomarker discovery and are less constrained by distributional assumptions than traditional statistics, which is advantageous in space biology where datasets are small and heterogeneous. Symbolic regression, including QLattice, has been shown to perform well for small datasets and to identify biologically meaningful interactions. Prior reports also connect mitochondrial regulation and neuromuscular adaptations to spaceflight across tissues, supporting the relevance of pathways detected in this study.
Methodology
Data sources and cohorts: Multi-omics and physiological datasets were sourced from NASA’s Open Science Data Repository (OSDR). Omics datasets included OSD-104 (RR-1 SOL: bulk RNA-seq and bisulfite sequencing DNA methylation) and OSD-105 (RR-1 TA: bulk RNA-seq, bisulfite sequencing DNA methylation, and TMT-based proteomics). Calcium reuptake data were from OSD-488, measured using Indo-1 fluorophore assays on muscle homogenates. RR-1 samples: female C57BL/6J mice, 16 weeks old, 37-day spaceflight (6 FLT and 6 GC per muscle for omics). RR-9 samples: male C57BL/6J mice, 10 weeks old, 35-day spaceflight (used for TA calcium reuptake; no RR-1 TA calcium data available). Preprocessing: RNA-seq raw counts were filtered to remove lowly expressed genes (≥10 non-zero reads in ≥3 samples), reducing to 15,848 (OSD-104) and 16,660 (OSD-105) genes, with 15,216 overlapping. Variance-stabilizing transformation (VST) via DESeq2 corrected for library size and heteroskedasticity. Proteomics data (OSD-105) were processed across two TMT runs using bridge-channel normalization (sample-to-bridge ratios), log2-transformed, missing values filtered, VST normalized (DEP), KNN-imputed, and batch effects removed (limma), yielding 1,786 proteins. Bisulfite sequencing (nf-core methylseq with Bismark) retained CpG sites, mapped to genes by genomic overlap; for each gene, percent CpG methylation was computed. Site-level features were tested but overfit; gene-level methylation features were retained (48,368 for OSD-104; 47,660 for OSD-105). Overlap analyses showed partial concordance between highly methylated loci and lowly expressed genes. Calcium reuptake phenotype: Ca2+ reuptake was measured as time-series Indo-1 fluorescence; area under the curve (AUC) was computed, with lower AUC indicating more efficient reuptake. Because omics and calcium measurements were not from the same animals, calcium values were paired to omics samples via perturbation analysis to optimize pairing, leveraging the expectation that FLT vs GC differences dominate over within-group differences (mission, sex, age). SOL omics were paired with age/sex-matched RR-1 SOL calcium; TA omics (RR-1 females) were paired with RR-9 TA calcium (10-week males). Modeling approach: QLattice (feyn v3.0.2) symbolic regression/classification was trained with leave-one-out cross-validation (LOOCV). Two tasks were performed per muscle: (1) regression to predict Ca2+ reuptake (AUC) from multi-omics features; (2) classification to distinguish FLT vs GC. For TA regression, RNA-seq and proteomics were used (methylation excluded due to reduced performance with similar results). For SOL regression, RNA-seq and methylation were used (no proteomics available). For classifications, TA used RNA-seq, proteomics, and methylation; SOL used RNA-seq and methylation. QLattice explored concise model architectures (computational graphs) with functions including bivariate Gaussian, multiplication, univariate tanh, linear, exponential, addition, and logarithm. Maximum architectural complexity was set to 6 for TA and 4 for SOL. Epochs (10–100) yielded similar validation and rankings. Feature importance was assessed by recurrence across LOOCV models. Statistics and enrichment: Gene set enrichment used Enrichr via gseapy (GO_Biological_Process_2021). Box plots and significance used Mann-Whitney-Wilcoxon two-sided tests. Model performance was summarized by cross-validated R² scores (top-1 [T1] and top-10 [T10] models) and counts of features by modality among top features. Muscle weights from RR-1 were compared (t-tests) between FLT and GC for SOL and TA.
Key Findings
Regression (TA): Using RNA-seq and proteomics (excluding methylation), QLattice identified Acyp1 and Rps7 proteins as the top predictors of Ca2+ reuptake AUC. Feature recurrence across LOOCV models: Acyp1 (proteomics) in 89 models; Rps7 (proteomics) in 27, with 24 of those co-occurring with Acyp1, suggesting a potential interaction. Representative model functions included bivariate Gaussian, bivariate multiplication, and tanh. Cross-validated performance: T1 CV R² = 0.894; T10 CV R² = 0.711. Among the top 50 features: 38 proteomic and 12 RNA-seq. Enrichment implicated apoptosis, endocytosis, and protein localization pathways. Biologically, Acyp1 has been reported to inhibit Ca2+ transporters such as SERCA-1 (fast-twitch); in the data, FLT TA showed lower Acyp1 protein and improved reuptake (lower AUC). Rps7 was positively associated with reuptake capacity, aligning with known downregulation by nitrosative stress. Regression (SOL): With RNA-seq and methylation, top predictive features were RNA-seq genes including Gm35576, Rspo3, Gpc4, Klhl31, Sox6, Auts2, Sobp, Mdga1, Aox1, Tle4, Klhl33, Eepd1, Rhbdl3 (and Gm21955). Models commonly used Gaussian, linear, and exponential relations among gene expressions. Enrichment highlighted cellular differentiation, synapse organization/assembly, and neuron migration. FLT SOL exhibited impaired reuptake and upregulation of genes such as Gpc4 and Tle4 (typically downregulated after injury), potentially contributing to reduced muscle quality; Rspo3 and Klhl31 upregulation may reflect compensatory/adaptive responses. Classification (TA, FLT vs GC): Using RNA-seq, proteomics, and methylation, QLattice achieved T1 CV R² = 1.0 and T10 CV R² = 0.997. Top recurrent features (all modalities) included Trak2 (RNA-seq), Tle4 (RNA-seq), Tspan4 (RNA-seq), Actin (proteomics), Gm22281 (methylation), Sell (RNA-seq), Ech1 (methylation), Fhod1 (RNA-seq), Egr2 (RNA-seq), Klhl21 (RNA-seq), Lrp2bp (RNA-seq). Enrichment pointed to skeletal muscle cell differentiation, positive regulation of myelination, Schwann cell differentiation, mitochondrial regulation, and actomyosin structural regulation. Actin (proteomics) was upregulated in FLT TA; Trak2 and Tle4 were upregulated in FLT TA; co-occurrence of Trak2 and Tle4 with Actin in models suggests potential co-regulatory networks in response to spaceflight. Classification (SOL, FLT vs GC): RNA-seq dominated across models (69–71 of ~80 features), with recurrent features including Fam220a, Lrp4, Osgin2, Gm29686, Gm22281 (methylation), Sema6a, Alpk3, Tmod1, Bcam. Most models were single-feature relations (linear, log, inverse). Among 120 models, 18 involved two features; 11 combined methylation with RNA-seq features, indicating potential cooperation between methylation and expression in SOL response. Performance: T1 CV R² = 1.0; T10 CV R² = 1.0. Enrichment implicated pre/post-synaptic membrane assembly/organization. Physiological outcomes: RR-1 SOL muscle weight decreased significantly in FLT vs GC (7.9 mg vs 10.5 mg; t-test p < 0.05), while TA weight showed no significant change (13.3 mg vs 13.9 mg). Modality contribution: In TA regression, including methylation reduced performance; proteomics and RNA features were most predictive, with proteomics generally stronger and more cohesive than RNA-seq. Overall, TA appeared more resilient to space conditions, with Acyp1 and Rps7 highlighted as candidate biomarkers; SOL showed impaired reuptake associated with distinct gene expression changes.
Discussion
The study demonstrates that explainable ML (QLattice symbolic regression) can uncover concise, interpretable mathematical relationships between multi-omics features and muscle phenotypes in small, heterogeneous spaceflight datasets. For TA, improved Ca2+ reuptake in FLT is linked to decreased Acyp1 protein and increased Rps7 protein, aligning with known roles of SERCA regulation and stress responses. For SOL, impaired reuptake corresponds to altered expression of genes involved in differentiation, synaptic organization, and neuronal processes, suggesting broader neuromuscular adaptations impacting muscle quality. Classification analyses further implicate pathways in mitochondrial regulation, myelination, Schwann cell biology, and actomyosin structure, consistent with previously reported spaceflight responses. QLattice’s ability to model non-linear and sigmoidal relationships (e.g., partial Gaussian shapes) is biologically meaningful given bounded physiological ranges (e.g., calcium concentrations), offering improved interpretability over linear models. The findings support focusing on proteomic signals (more immediately linked to function) for predictive modeling, while methylation may capture slower, cumulative changes. Collectively, these insights narrow the search space for mechanistic studies and suggest testable hypotheses (e.g., Acyp1–Rps7 interaction in TA; regulatory roles of Tle4, Rspo3, Klhl31, Gpc4 in SOL).
Conclusion
This work identifies candidate multi-omics biomarkers and interactions underlying differential calcium reuptake and broader molecular adaptations in mouse SOL and TA muscles during spaceflight. Using QLattice symbolic regression/classification, the study highlights Acyp1 and Rps7 proteins as key predictors of TA resilience and gene networks (including Gpc4, Tle4, Rspo3, Klhl31) associated with impaired SOL reuptake. The approach showcases the value of explainable ML for small, high-dimensional space biology datasets and underscores the strong predictive contribution of proteomics relative to RNA-seq, with limited impact from methylation in-flight. Future research should include experimental validation of top biomarkers and interactions; longitudinal methylation profiling pre- and post-flight in the same animals; deeper characterization of how QLattice-identified mathematical functions map to biological concentration-response relationships; and systematic assessment of modality-specific contributions to multi-omics prediction. These directions will help translate the identified signatures into effective countermeasures against spaceflight-induced muscle atrophy.
Limitations
Key limitations include: (1) Omics and calcium reuptake data were not measured in the same animals; pairing was inferred, which may introduce bias. (2) TA calcium data were from RR-9 male mice (10 weeks), while TA omics were from RR-1 female mice (16 weeks), creating sex and age mismatches across missions. (3) Small sample sizes typical of spaceflight studies increase the risk of overfitting and limit generalizability despite QLattice’s regularization and LOOCV. (4) Proteomics data were unavailable for SOL, limiting cross-modality comparisons for that muscle. (5) DNA methylation features contributed little to predictive performance in-flight, which may reflect temporal dynamics not captured in this study; lack of longitudinal methylation data constrains interpretation. (6) Code is not publicly available due to NASA release requirements, limiting full reproducibility of the computational workflow.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny