Medicine and Health
Proteomic analysis of cardiorespiratory fitness for prediction of mortality and multisystem disease risks
A. S. Perry, E. Farber-eger, et al.
This groundbreaking research links proteomic profiles to cardiorespiratory fitness in over 14,000 individuals, revealing a novel proteomic CRF score associated with lower mortality risk and potential for personalized exercise recommendations. This innovative study, conducted by a team of leading researchers, showcases the promising future of population-based proteomics in understanding health.
~3 min • Beginner • English
Introduction
Cardiorespiratory fitness (CRF) is a powerful prognostic marker associated with health, quality of life, and longevity, and is often considered a vital sign in clinical care. However, widespread clinical assessment of CRF is limited by test availability, cost, and patient factors that may preclude maximal exercise testing. An alternative is to develop accessible, training-responsive biomarkers that capture CRF and could also illuminate pharmacologic targets mimicking exercise benefits. Exercise induces broad metabolic changes across pathways related to regeneration, fibrosis, muscle structure, mitochondrial function, insulin resistance, and inflammation. Although molecular surrogates of CRF have been linked to prognosis, prior studies were often single-population with limited outcomes and modest additivity over standard risk factors. This study aimed to develop and validate a circulating proteomic signature of CRF across multiple cohorts and modalities, and to evaluate its associations with mortality and multisystem disease risk, its complementarity with polygenic risk, and its modifiability with exercise training.
Literature Review
Prior work has demonstrated associations between CRF and reduced mortality and cardiovascular events, and positioned CRF as a key clinical metric. Molecular studies have cataloged acute and chronic exercise-induced changes in metabolites, proteins, and transcripts, including exerkines that may mediate exercise benefits. Previous proteomic and metabolomic instruments related to CRF showed some prognostic value but often in limited samples and without strong incremental prediction over clinical factors. Nonexercise CRF prediction models have used questionnaires, resting heart rate, body composition, genetics, and wearable data, but typically lacked demonstration of robust associations with broad clinical outcomes. This study builds on these by integrating large-scale proteomics across diverse cohorts, linking to longitudinal outcomes, genetics, and training responses.
Methodology
Design and cohorts: Discovery in CARDIA (N=2,238; median age 51; 56% female; 43% Black) using SomaScan 7K proteomics and symptom-limited treadmill exercise (ETT time) as CRF. Validation across Fenland (N=10,320; submaximal treadmill; SomaScan 5K), HERITAGE (N=742; cycle CPET with peak VO₂; SomaScan 5K), and BLSA (N=845; treadmill CPET with peak VO₂; SomaScan 7K). Clinical outcome associations and polygenic risk integration were tested in UK Biobank (UKB; N=21,988; Olink Explore 1536 panel) with median 13.7 years follow-up.
Proteomics: CARDIA quantified 7,524 aptamers; excluded nonhuman proteins and those with CV >20%. Fenland and HERITAGE used SomaScan 5K; BLSA used 7K; UKB used Olink Explore 1536, excluding proteins with >40% below LOD or >20% missingness.
CRF assessment: CARDIA used modified Balke treadmill ETT duration. Fenland used submaximal treadmill heart rate–workrate relationship extrapolated to age-predicted maximal HR to estimate peak VO₂. HERITAGE used cycle CPET (criteria including RER >1.1, VO₂ plateau, or HR near age-predicted max); participants underwent a supervised 20-week training program with pre/post assessments. BLSA used treadmill CPET (modified Balke), restricting to RER ≥1.0 to ensure near-maximal effort.
Score development: In CARDIA, LASSO linear regression modeled CRF (ETT time) as outcome with age, sex, race, and BMI unpenalized and the proteome penalized; proteins and CRF were log-transformed and standardized. The proteomic CRF score was a linear combination of selected protein coefficients (covariates excluded from score). Cross-validation selected hyperparameters. The model reduced candidates from 7,230 to 272 aptamers; calibration: Spearman’s ρ=0.79 (derivation, N=1,569) and ρ=0.67 (validation, N=669).
Recalibration: To translate the score across platforms, a recalibration LASSO in CARDIA used the original score as dependent variable and overlapping proteins as predictors, generating coefficients applicable to Fenland, HERITAGE, and UKB. Recalibration accuracy in CARDIA: Pearson r=0.98 (HERITAGE), 0.99 (Fenland), 0.93 (UKB).
Validation against measured CRF: Spearman’s ρ with measured CRF: HERITAGE 0.71 (cycle CPET), BLSA 0.68 (treadmill CPET), Fenland 0.35 (submaximal estimate). Associations with demographics mirrored known CRF epidemiology: higher in men, inverse with age and BMI.
Outcomes analysis in UKB: Standard and fully adjusted Cox models assessed associations with all-cause and cause-specific mortality, and incident cardiovascular, metabolic, neurological, hepatic, and cancer outcomes; Fine–Gray competing risks for cause-specific deaths. Models compared C-index and net reclassification index (NRI) with and without the proteomic score. Interaction and additivity with polygenic risk scores (PRS) for common diseases were tested with multiplicative interaction terms (adjusted for age, sex, race, and genetic principal components).
Abbreviated panel: Constructed a 21-protein abbreviated score (top 21 by absolute LASSO beta) to assess translational feasibility; compared effects with the 307-protein recalibrated score in UKB.
Training response: In HERITAGE (N=643 with pre/post), tested change in proteomic CRF score after 20-week training and its association with change in peak VO₂, adjusted for demographics, BMI, and baseline measures. Also examined whether pretraining score predicted VO₂ response. Proteins significantly changed with training (FDR<5%) were correlated with cardiometabolic phenotypes in CARDIA.
Sensitivity: Derived a CRF score excluding CARDIA participants with prevalent CVD, diabetes, and hypertension to test for confounding by disease; translated to UKB and re-tested outcomes.
Key Findings
- Score development and validation: The LASSO-based proteomic CRF score selected 272 of 7,230 candidate aptamers (>95% reduction). Calibration with CRF (ETT time) in CARDIA showed Spearman’s ρ=0.79 (derivation) and ρ=0.67 (validation). External correlations with measured CRF: HERITAGE ρ=0.71 (cycle CPET), BLSA ρ=0.68 (treadmill CPET), Fenland ρ=0.35 (submaximal estimate).
- Associations in UKB (N=21,988; median follow-up 13.7 years): Each 1 s.d. higher proteomic CRF score associated with ~50% lower all-cause mortality hazard (HR=0.53, 95% CI 0.50–0.56; P<0.0001), with consistent reductions in cause-specific mortality and multiple disease outcomes (cardiovascular, metabolic, neurological), but not most cancers.
- Risk prediction improvement: Adding the proteomic CRF score improved discrimination and reclassification beyond standard risk factors for many endpoints. Examples: All-cause mortality C-index improved from 0.75 to 0.77 (P=4.17×10⁻¹³), cardiovascular mortality 0.79 to 0.82 (P=9.07×10⁻¹⁰); substantial NRI (e.g., all-cause death NRI=0.35 [0.31–0.41], respiratory death NRI=0.79 [0.64–0.98]).
- Polygenic risk: Proteomic CRF score and PRS effects were largely additive with minimal interaction; highest risks occurred in individuals with low proteomic CRF and high PRS. Standardized estimates for the proteomic score were comparable to or larger than PRS for several conditions (e.g., type 2 diabetes HR_proteome=0.37, 95% CI 0.35–0.40; HR_PRS=1.97, 95% CI 1.83–2.12).
- Abbreviated 21-protein panel: Correlated with CRF in CARDIA (ρ=0.71) and yielded effect sizes across UKB outcomes similar to the full 307-protein recalibrated score, with slight attenuation, supporting translational feasibility.
- Training plasticity (HERITAGE): The proteomic CRF score increased after 20-week training (mean change 0.14 s.d., 95% CI 0.11–0.18; P=2.5×10⁻¹⁵). Change in score associated with change in peak VO₂ (per 1 s.d. increase in score: +0.84±0.25 ml·kg⁻¹·min⁻¹; P=8.5×10⁻¹⁰), independent of demographics, BMI, and baseline measures. Higher pretraining score predicted greater VO₂ gains (+0.59±0.17 ml·kg⁻¹·min⁻¹ per 1 s.d.; P=6.4×10⁻¹⁰), attenuated with BMI adjustment (+0.30±0.17; P=0.08).
- Biological plausibility: Top-effect proteins implicated pathways in inflammation (e.g., C5a), atherosclerosis (AGER, RGMB), neuronal survival (CDNF, LSAMP), oxidative stress, energy metabolism (OLFM2, FABP3/4, HNF4A), adiposity (LEP), muscle response (MB, ATF6), and autophagy (GLIPR2). Proteins that changed with training correlated with cardiometabolic phenotypes (e.g., LEP, RARRES2, RGMB, CDNF).
Discussion
The study addresses the need for accessible, reproducible surrogates of CRF by developing a circulating proteomic score that captures multiorgan determinants of fitness. The score robustly tracked measured CRF across diverse cohorts and exercise modalities, supported by biologically plausible protein pathways. Clinically, the proteomic CRF score was strongly and independently associated with all-cause and cause-specific mortality and with multiple cardiovascular, metabolic, and neurological outcomes, improving risk discrimination and reclassification beyond standard factors. Its effects were largely additive to polygenic risk, indicating complementary roles for proteomic and genetic information in precision risk assessment. Importantly, the score demonstrated modifiability with exercise training and predicted trainability, supporting its potential use in tailoring and monitoring interventions. The feasibility of a parsimonious 21-protein panel suggests a practical path toward clinical translation as a blood-based surrogate of CRF, particularly in settings or populations where direct CRF testing is impractical.
Conclusion
This work defines and validates a circulating proteomic biomarker of cardiorespiratory fitness across approximately 14,000 individuals and multiple modalities, links it to diverse clinical outcomes in ~22,000 UK Biobank participants, demonstrates additivity with polygenic risk, and shows training-induced plasticity. A reduced 21-protein panel preserved predictive utility, supporting clinical scalability. These findings position population-scale proteomics as a biologically grounded, clinically actionable surrogate of CRF that could augment risk stratification and personalize exercise interventions. Future research should expand to broader age ranges and exercise types, refine assay platforms, validate in clinical populations, and test implementation for risk-guided prevention and therapy.
Limitations
- Heterogeneity and non-standardization of CRF assessments across cohorts (maximal vs submaximal tests; treadmill vs cycle) may contribute to variability but also demonstrate robustness via cross-validation.
- In CARDIA, a ~5-year interval between proteomic and CRF assessments could introduce noise, though external replication and training responsiveness mitigate concern.
- Limited representation of older adults; generalizability to very elderly or clinical populations requires further study.
- Aptamer-based SomaScan platforms may have per-protein specificity limitations; however, clinical associations were validated using a different platform (Olink) in UKB.
- UKB outcomes derive from administrative data with potential misclassification/ascertainment biases, likely biasing toward the null.
- Molecular dimensionality remains substantial even after regularization; while a 21-protein panel showed promise, further optimization and standardization for clinical use are needed.
Related Publications
Explore these studies to deepen your understanding of the subject.

