Age is a major risk factor for most chronic diseases and causes of death. While chronological age is a strong indicator, it's an imperfect measure of biological aging. 'Omics' data, particularly proteomics, offer a more precise way to estimate biological age by measuring protein levels, providing mechanistic insight into aging processes. Loss of proteostasis is a key hallmark of aging, and previous studies have identified aging-related proteins (APs) and used them to develop proteomic age clocks. However, these studies lacked the large, diverse population samples needed for comprehensive assessment of predictive performance across a wide range of age-related diseases and functional traits. This study aimed to address these gaps by developing and validating a proteomic age clock in large, diverse biobanks and systematically assessing its ability to predict various aging-related phenotypes, multimorbidity, and mortality risk.
Literature Review
Several previous studies have explored the use of proteomics to develop age clocks, identifying aging-related proteins (APs) and using them to predict disease risk and mortality. However, these studies often used smaller sample sizes and lacked the diversity of populations needed to ensure generalizability. Furthermore, none had comprehensively assessed the predictive power across all major chronic diseases and age-related functional traits, nor had they independently validated their clocks across diverse ancestry populations. This study builds upon this previous work by utilizing significantly larger and more diverse datasets, allowing for a more robust and generalizable proteomic age clock.
Methodology
The study used data from three biobanks: the UK Biobank (UKB), the China Kadoorie Biobank (CKB), and FinnGen. The UKB cohort (n=45,441) served as the discovery cohort. The CKB (n=3,977) and FinnGen (n=1,990) cohorts provided independent validation. Plasma proteomic data from the Olink Explore 3072 panel (2,897 proteins) were used. Six machine-learning methods were compared to build a proteomic age clock, with LightGBM selected based on its superior generalizability. Feature selection (Boruta algorithm and SHAP values) identified 204 APs. The proteomic age gap (ProtAgeGap), the difference between protein-predicted age and chronological age, was calculated. Associations between ProtAgeGap and 27 aging-related phenotypes (biological, functional, cognitive), all-cause mortality, and incidence of 26 common age-related diseases were assessed using linear, logistic, and Cox proportional hazards models. A smaller 20-protein model (ProtAge20) was also developed, achieving 95% of the performance of the 204-protein model. Sensitivity analyses were performed among never smokers and those with normal BMI.
Key Findings
The LightGBM model accurately predicted chronological age (Pearson r = 0.94 in UKB, r = 0.92 in CKB, r = 0.94 in FinnGen). ProtAgeGap was significantly associated with all investigated measures of biological, physical, and cognitive function except for two liver biomarkers (ALT and total bilirubin). Even after excluding participants with pre-existing diseases, the associations largely persisted. ProtAgeGap was a strong predictor of all-cause mortality and the incidence of all 14 common non-cancer diseases studied, except Parkinson's disease in the 204-protein model. The 20-protein model showed associations with all the diseases. In the fully adjusted model, the largest effect sizes were for Alzheimer's disease, all-cause dementia, and chronic kidney disease. Furthermore, ProtAgeGap was significantly associated with eight cancers in the UKB, with four showing significant associations (esophageal, lung, non-Hodgkin lymphoma, and prostate cancer) after further adjustment. The top 20 proteins contributing to the clock were involved in diverse functions, including extracellular matrix interactions, immune response, hormone regulation, neuronal function, and development. There was minimal overlap between the genes coding for the selected proteins and those in leading DNA methylation clocks.
Discussion
The study provides strong evidence that proteomic aging is a common underlying factor for a wide range of age-related traits and diseases across diverse populations. The high generalizability of the proteomic age clock across different populations highlights its potential as a universal biomarker of aging. The strong association between proteomic aging and telomere length, unlike DNA methylation clocks, further validates the use of proteomics in aging research. The study also demonstrates the potential of the proteomic clock for early disease detection, particularly for Alzheimer's disease. The study addresses limitations of previous proteomic clocks by including a larger, more diverse sample size and conducting a more comprehensive analysis of aging-related phenotypes and disease outcomes.
Conclusion
This study demonstrates that a proteomic aging clock, built using data from a large and diverse population, is a reliable predictor of mortality and various age-related diseases. The identified 204 proteins highlight diverse biological pathways involved in aging, offering potential targets for interventions. The generalizability of the clock across populations suggests its broader applicability in aging research and clinical practice. Future research should focus on the mechanistic understanding of these proteins and their roles in the aging process, as well as the development of targeted interventions.
Limitations
The study relies on observational data, limiting causal inference. The reliance on specific proteomic platforms might influence the results. The UKB sample, while large, may not perfectly represent the global population. Some analyses were limited by smaller sample sizes in validation cohorts. The analysis of causal relationship between proteomic age and diseases was limited by the observational nature of the study design.
Related Publications
Explore these studies to deepen your understanding of the subject.