Medicine and Health
Proteomic aging clock predicts mortality and risk of common age-related diseases in diverse populations
M. A. Argentieri, S. Xiao, et al.
Discover the groundbreaking proteomic age clock developed using plasma protein data from the UK Biobank, validated across diverse populations. This innovative research by M. Austin Argentieri and colleagues highlights how proteomic aging can predict age-related functional status and mortality risk.
~3 min • Beginner • English
Introduction
Aging drives risk for most common chronic diseases and death, but chronological age imperfectly captures biological aging. Prior work has used DNA methylation (DNAm) clocks and smaller proteomic studies to estimate biological age and predict outcomes; however, existing proteomic clocks have not been developed in large, general population cohorts nor comprehensively evaluated across a wide spectrum of diseases and functional aging traits. Moreover, cross-ancestry validation has been limited. This study aimed to develop a robust proteomic aging clock from plasma proteins in a large population sample and to test whether proteomic age acceleration predicts mortality, multimorbidity, multiple noncancer diseases, cancers and aging-related biological, physical and cognitive traits, and to validate performance across diverse populations (UK, China, Finland).
Literature Review
DNAm-based clocks can capture epigenetic changes but often show limited correlation with functional and clinical aging outcomes unless trained on composite phenotypic measures. Proteins may reflect more proximal functional biology, and loss of proteostasis is a hallmark of aging. Previous proteomic clocks identified aging-related proteins and predicted select outcomes, but they were limited by smaller samples, partial disease coverage, and lack of independent validation across ancestries. Prior proteomic studies using SOMAscan reported discrepancies with Olink assays and limited associations with key biomarkers. This study builds on these gaps by using a large Olink-based dataset to develop a generalizable proteomic age clock, systematically testing associations across major diseases and functional measures, and validating in Chinese and Finnish cohorts.
Methodology
Study design and cohorts: Participants were from UK Biobank (UKB; n=45,441; 54% female; ages 39–71), China Kadoorie Biobank (CKB; n=3,977; 54% female; ages 30–78; IHD case-cohort), and FinnGen (n=1,990; 52% female; ages 19–78; largely healthy). Follow-up: UKB 11–16 years; CKB 11–14 years; FinnGen had minimal events (1% mortality).
Proteomics: Plasma proteins measured via Olink Explore 3072 (Cardiometabolic, Inflammation, Neurology, Oncology). Proteins available in all three cohorts were retained; three proteins with >10% missing in UKB were removed, leaving 2,897 proteins. Within-cohort normalization included MinMax scaling to [0,1] and centering on the median. Proteins with missingness were imputed (UKB, FinnGen) using miceforest (up to 5 iterations). UKB non-proteomic covariates were imputed with missRanger.
Age modeling: UKB was split 70/30 into train (n=31,808) and test (n=13,633). Six ML methods were benchmarked to predict chronological age from 2,897 proteins: LASSO, elastic net, LightGBM, and three neural networks (MLP, ResNet, TabR). Fivefold cross-validation and Optuna-based hyperparameter tuning were used, optimizing R². LightGBM showed superior generalizability to CKB and FinnGen and was selected.
Feature selection and final models: Boruta feature selection with SHAP values identified 204 aging-related proteins (APs). A refined LightGBM model trained on these 204 proteins predicted age with high accuracy. Protein-predicted age (ProtAge) was generated in UKB using fivefold CV; in CKB and FinnGen, the trained UKB model was applied. Proteomic age gap (ProtAgeGap) was defined as ProtAge minus chronological age in each cohort. Recursive SHAP-based feature elimination yielded a 20-protein model (ProtAge20) achieving ~95% of the performance of the 204-protein model; ProtAgeGap20 was also computed.
Association analyses: In UKB, linear/logistic regression tested ProtAgeGap associations with 27 aging and frailty markers: biological measures (e.g., telomere length, IGF-1, albumin, creatinine, cystatin C, AST, ALT, GGT, CRP), frailty index, physical and cognitive function (e.g., walking pace, grip strength, reaction time, fluid intelligence, arterial stiffness, lung function, blood pressure, BMI, insomnia, sleep duration, self-rated health, self-rated facial aging). Models adjusted for age, sex, ethnicity, Townsend deprivation index, assessment center, physical activity (IPAQ group), and smoking; FDR correction applied. Sensitivity analyses restricted to participants with no lifetime diagnosis of any of 26 studied diseases (n=20,315) and to never smokers and normal BMI subgroups.
Outcomes: Incident outcomes analyzed in UKB included all-cause mortality, 14 common noncancer diseases (IHD, type 2 diabetes, chronic kidney disease (CKD), chronic liver diseases, COPD, all stroke, ischemic stroke, osteoporosis, osteoarthritis, rheumatoid arthritis, macular degeneration, Parkinson’s disease, all-cause dementia, Alzheimer’s disease (AD)), and 12 cancers. Cox proportional hazards models with increasing covariate adjustment were used: Model 1 (age, sex), Model 2 (Model 1 + Townsend, assessment center, IPAQ, smoking, ethnicity), Model 3 (Model 2 + BMI, prevalent hypertension). Prevalent cases were excluded for incident analyses; FDR correction applied. Kaplan–Meier cumulative incidence by ProtAgeGap deciles was plotted. Cancer associations were also examined. Individual associations of the 20 proteins with diseases were assessed using Cox models adjusted as in Model 2.
Network and enrichment analyses: Functional enrichment (GO BP/MF, KEGG, Reactome) and STRING PPI networks (v12; coexpression; confidence >0.7) were evaluated using all Olink proteins as background. SHAP interaction-based PPI networks from the trained model were constructed with a threshold of mean absolute interaction 0.0083.
Repeat measures: Stability of age associations for 149 APs was assessed across three UKB time points (baseline; 2014+; 2019+) using linear regression betas and Pearson correlations.
Ethics and data/code availability: Approvals and access per UKB, CKB, FinnGen policies. Code available at https://github.com/miargentieri/proteomic-age-ukb.
Key Findings
- Age prediction accuracy: The 204-protein LightGBM model predicted age with high accuracy: UKB test set R²=0.88, Pearson r=0.94; CKB R²=0.82, r=0.92; FinnGen R²=0.87, r=0.94. A 20-protein model (ProtAge20) retained ~95% performance (r=0.89, R²=0.78 in UKB).
- Stability over time: For 149 APs with repeat measures in 1,085 UKB participants, age-association betas correlated strongly across baseline, 2014+ and 2019+ visits (r=0.90–0.97), indicating temporal stability over 9–13 years.
- Distribution of proteomic age gap: In UKB, the top 5% vs bottom 5% of ProtAgeGap differed by ~12.3 years (6.3 vs −6.0 years). ProtAgeGap distributions were similar across sexes, UKB self-reported ethnicities, and CKB regions.
- Associations with biological aging markers and function: Higher ProtAgeGap associated with worse kidney function (cystatin C, creatinine), higher AST and GGT, higher CRP, lower albumin and IGF-1, and shorter telomere length. It also associated with higher frailty index, poorer self-rated health, slower walking pace, higher systolic/diastolic BP, higher arterial stiffness, higher BMI, slower reaction time, lower fluid intelligence, lower lung function, and weaker grip strength. Most associations persisted in a disease-free subset (n=20,315), mitigating reverse causation concerns. ProtAgeGap20 showed broadly similar associations.
- Mortality and disease risk: ProtAgeGap significantly predicted all-cause mortality and all studied noncancer diseases except Parkinson’s disease in fully adjusted models. Per 1-year increase in ProtAgeGap (Model 3): AD HR 1.16 (95% CI 1.12–1.20); all-cause dementia HR 1.12 (1.10–1.15); CKD HR 1.10 (1.08–1.11). Estimated risk multipliers: relative to ProtAgeGap=0, top 5% had ~1.9× mortality risk, ~2.6× AD risk, ~1.8× CKD risk; compared with bottom 5%, top 5% had ~3.6× mortality, ~5.8× AD, ~3.1× CKD. Associations remained in never smokers and normal BMI subsets for most outcomes.
- Cumulative incidence stratification: UKB participants in top vs median vs bottom ProtAgeGap deciles showed divergent trajectories over 11–16 years. At age 65 at recruitment in top decile: osteoarthritis 59.4%, mortality 55.2%, IHD 50.6%, type 2 diabetes 35.3%, CKD 33.6% cumulative incidence. CKB showed similar stratification for IHD, mortality, all stroke and ischemic stroke, albeit with wider CIs.
- Cancers: ProtAgeGap associated with higher risk of esophageal, lung, non-Hodgkin lymphoma and prostate cancers after full adjustment; multiple cancers showed separated cumulative incidence by ProtAgeGap deciles.
- Multimorbidity and self-rated health: ProtAgeGap increased linearly with number of lifetime diagnoses; those with 4+ diagnoses had substantially higher ProtAgeGap. Lower ProtAgeGap observed among individuals reporting excellent vs poor health.
- Biological pathways and networks: The 204 APs were enriched for anatomical structure development/developmental process (GO BP). STRING coexpression PPI revealed a 66-protein interconnected subnetwork (notable hubs: EGFR, CXCL12, ITGAV, CXCL9, CD8A). SHAP interaction-based PPI highlighted ELN, EDA2R, LTBP2, CXCL17, GDF15 as highly interactive. The 20 key proteins span ECM/cell adhesion (ELN, COL6A3, CDCP1, PODXL2, LTBP2, SCARF2, ENG), immune/inflammation (CXCL17, LECT2, SCARF2, GDF15), hormone/reproduction (FSHB, AGRP, ACRV1), signaling (EDA2R, SCARF2, PTPRR), protease/enzymatic (KLK3, KLK7), energy balance (GDF15, AGRP), neuronal structure (GFAP, NEFL), and development/differentiation (EDA2R, LTBP2, ENG).
- Comparison to existing clocks: Minimal gene overlap with leading DNAm clocks; 134/204 APs (64%) were novel relative to major proteomic aging studies (Johnson et al., Coenen et al., Lehallier et al.). ProtAge showed stronger associations with multiple clinical biomarkers and outcomes than reported in some SOMAscan-based clocks.
- Telomere: Proteomic aging showed a strong inverse association with leukocyte telomere length, contrasting with prior DNAm age findings.
Discussion
The study demonstrates that a proteomic age clock trained solely to predict chronological age captures biological aging processes relevant to morbidity, mortality and functional decline, and generalizes across populations with distinct genetic backgrounds and morbidity profiles (UK, China, Finland). Proteomic aging integrates signals from diverse biological domains—ECM integrity, immune/inflammation, hormonal regulation and neurobiology—offering mechanistic proximity compared to DNAm clocks. The strong inverse association with telomere length and robust prediction of multiple diseases and mortality suggest that proteomic aging reflects systemic deterioration relevant to age-related pathology. Unlike first-generation DNAm clocks, the proteomic clock remained strongly associated with outcomes without being trained on composite phenotypes. Differences between mRNA and protein abundance and long half-lives/post-translational modifications of key ECM proteins (e.g., elastin, collagens) may explain why proteomics captures aspects of aging not well represented by DNAm. The clock’s ability to stratify absolute risks and multimorbidity burden supports potential clinical utility in risk prediction and prevention strategies across diverse populations.
Conclusion
This work introduces a robust, generalizable proteomic aging clock (ProtAge) based on 204 plasma proteins that accurately predicts chronological age and strongly associates with mortality, multimorbidity, and a wide spectrum of age-related diseases and functional measures. A compact 20-protein model retains most predictive power. The clock’s biological underpinnings span ECM remodeling, immune/inflammation, hormonal and neuronal pathways, highlighting proteostasis and ECM dynamics as key features of human aging. Future research should: (1) evaluate clinical utility for screening and individualized risk stratification; (2) test interventions to modify proteomic aging trajectories and assess impact on morbidity and mortality; (3) extend validation to additional ancestries, environments and disease contexts; (4) explore causal mechanisms linking specific proteins/pathways to age-related diseases; and (5) harmonize proteomic platforms to enhance cross-study comparability.
Limitations
- Event scarcity in validation cohorts: CKB had smaller numbers for several outcomes (wider CIs), and FinnGen had few deaths and limited morbidity, constraining outcome analyses outside UKB.
- Primary reliance on UKB for disease associations: Most association analyses were performed in UKB due to sample size and case availability, which may limit generalizability of specific effect sizes.
- Platform differences: Prior discrepancies between Olink and SOMAscan proteomics may limit comparability with earlier proteomic clocks and protein–phenotype associations.
- Residual confounding and reverse causation: Although extensive covariate adjustment and disease-free analyses were performed, unmeasured confounding and subclinical disease cannot be fully excluded.
- Representativeness: The CKB nested case-cohort design and FinnGen selection of mostly healthy participants reduce representativeness for some analyses; UKB proteomics participants were a randomized subsample but not the full cohort.
Related Publications
Explore these studies to deepen your understanding of the subject.

