Medicine and Health

Identifying individuals with high risk of Alzheimer’s disease using polygenic risk scores

G. Leonenko, E. Baker, et al.

This study reveals innovative insights into calculating polygenic risk scores to pinpoint individuals at risk for Alzheimer's disease. Conducted by a team of experts including Ganna Leonenko and Joshua Stevenson-Hoare, the research emphasizes the importance of choosing effective predictors and the benefits of standardizing scores for better comparison across studies.... show more

Introduction

Alzheimer’s disease (AD) is the most common dementia and develops over many years before clinical diagnosis, making robust pre-symptomatic risk stratification valuable for trials and mechanistic studies. While lifestyle and vascular risk management can delay onset, genetic factors—particularly APOE—substantially influence risk and age at onset. Polygenic risk scores (PRS) aggregate effects of many variants to estimate genetic liability, yet there is no consensus on whether AD risk architecture is oligogenic or polygenic, how to optimally include or model APOE, which SNP p-value threshold (pT) to use, or how to compare PRS across cohorts. This study asks: (1) what is the optimal pT and APOE modeling strategy for AD PRS, (2) how do different PRS computation methods compare in predictive performance and in identifying extreme-risk individuals, and (3) how should PRS be standardized to enable comparability and robust selection of individuals at high/low risk.

Literature Review

Evidence supports a largely polygenic architecture for AD risk, though some argue for an oligogenic model dominated by a few loci including APOE. Prior GWAS and PRS studies differ on SNP inclusion thresholds, LD handling, and APOE treatment, leading to inconsistent performance and interpretation. Methods span simple clumping+thresholding (C+T) to Bayesian approaches (e.g., LDpred-inf, PRS-CS, LDAK, SBayesR) and functionally informed models. Age-dependent APOE-ε4 frequency and case-control age mismatch can bias estimates. Prior work highlights the strong, age-related effects of APOE, overlap of pathways implicated by APOE and GWAS loci, and the potential utility of PRS for stratification beyond APOE, particularly among ε4 negatives.

Methodology

Datasets: Analyses used multiple cohorts including 1000 Genomes Europeans (N=503) as a population reference, UK Biobank, HipSci, and AD-related cohorts (ADNI, ROSMAP, MSBB, MAYO), with standard quality control and principal components adjustment. Case-control data included clinically defined AD cases and cognitively normal controls. PRS construction: Primary PRS used Kunkle et al. 2019 AD GWAS summary statistics. C+T PRS were computed with PLINK across multiple p-value thresholds (e.g., 5e-8, 1e-5, 0.1, 0.5), with LD clumping (large windows) and standard allele weighting by GWAS effect sizes. APOE region (chr19:44.4–46.5 Mb) was excluded for PRS.no.APOE. An APOE-weighted model (PRS.AD) combined PRS.no.APOE (pT≤0.1 unless stated) with APOE genotype effects (ε2 and ε4) using published betas. Oligogenic risk scores (ORS) used only genome-wide significant SNPs (e.g., p≤1e-5) with/without APOE. Alternative PRS methods: PRS were also computed using PRS(C+T), PRSice, LDpred-inf, PRS-CS, LDAK, and SBayesR. Methods not requiring p-thresholding (LDpred-inf, PRS-CS, LDAK, SBayesR) were applied to genome-wide data. Standardization: PRS were standardized either within each study sample (mean=0, SD=1) or against the 1000 Genomes European reference (using population mean and SD) to enable cross-cohort comparability and to enhance identification of extremes. Statistical analysis: Logistic regression (R glm with logit link) estimated prediction metrics: area under the ROC curve (AUC) and variance explained (pseudo-R²). Odds ratios (OR) with 95% CIs were computed for individuals at PRS extremes (±2 SD from the mean, using in-sample or population-based standardization). Haldane correction was used when contingency table cells were zero. A simulation study (10,000 cases; 10,000 controls) examined effects of age structure and APOE-ε4 frequency (controls 0.12; cases 0.356) on PRS/ORS behavior, incorporating significant SNPs (including rs429358) and a polygenic background.

Key Findings

Optimal modeling: Best prediction accuracy was achieved using a two-predictor model combining APOE and a PRS excluding the APOE region (PRS.no.APOE) with pT≤0.1 for SNP selection.
Method comparison: Across PRS approaches applied to the same sample, overall predictive accuracy was similar, but individuals’ scores and rankings differed. PRS(C+T) and PRSice identified highly overlapping extreme individuals; SBayesR overlapped least with others. LDPred-inf and PRS-CS overlapped considerably with C+T/PRSice; LDAK and SBayesR overlapped less.
Predictive performance: Typical whole-sample AUCs across methods were about 65–70% (pseudo-R² roughly 0.09–0.16), with lower performance for SBayesR (AUC ~54–61%, R² ~0.01–0.05). ORS was predominantly driven by APOE-ε4 and performed poorly for identifying negative-risk extremes.
Standardization: Standardizing PRS against a population reference (1000 Genomes) rather than within-sample improved comparability between studies and increased the number of detectable positive/negative extremes in case-control datasets (because population mean lies between case and control means and has smaller variance).
Extremes selection: In case-control data, focusing on PRS extremes (±2 SD) yielded high discrimination. Using PRS.AD, ORs up to ~124 with AUCs around 88–96% were observed for extremes; among APOE ε3/ε3 individuals, PRS.no.APOE identified extremes with OR ~95 and AUC ~95.7%. ORS.no.APOE performed poorly in ε3 homozygotes (AUC ~56.3, OR <1), indicating oligogenic modeling is inadequate for discrimination in this subgroup.
APOE and age: Mean ORS and PRS patterns across age groups mirrored age-dependent APOE-ε4 frequency; oligogenic scores decreased with age in cases and increased in controls, consistent with APOE-driven, age-related effects.

Discussion

Modeling APOE separately from the polygenic background and using a moderately inclusive SNP threshold (pT≤0.1) best captures AD genetic risk while mitigating APOE-driven confounding, especially given age-dependent APOE-ε4 frequency differences between cases and controls. While average predictive performance across PRS methods is similar, the identity of extreme-risk individuals varies, underscoring that method choice can alter who is selected for downstream studies. Population-based standardization centers and scales PRS to a stable reference, improving cross-study comparability and enhancing detection of high/low-risk extremes in enriched case-control samples. ORS, being largely APOE-driven, is less effective at identifying low-risk individuals and underperforms in ε3 homozygotes, supporting a predominantly polygenic architecture beyond APOE. Selecting individuals at PRS extremes yields strong enrichment for cases or controls and could facilitate mechanistic and proof-of-concept therapeutic studies, although current whole-sample AUCs remain insufficient for clinical prediction.

Conclusion

For AD, an optimal PRS strategy uses pT≤0.1 and models APOE separately from the genome-wide PRS (e.g., combining APOE with PRS.no.APOE). PRS(C+T) performed robustly and explained the most variance among evaluated approaches. To enable comparability and improve detection of extreme-risk individuals, PRS should be standardized against an appropriate population reference rather than within-sample. While whole-sample predictive accuracy remains moderate, focusing on extremes provides high discrimination and is useful for stratification in research. Future work should refine APOE-independent polygenic components, clarify age-dependent genetic effects, incorporate functional annotations and LD modeling, and replicate findings in larger, diverse, and clinically harmonized cohorts.

Limitations

Small case-control sample size reduces power and leads to broad confidence intervals for ORs, especially in extremes analyses.
Heterogeneous clinical definitions of AD across combined cohorts and inconsistent age measures (e.g., age at interview vs. age at death) may introduce misclassification and reduce power.
Excluding the entire APOE locus (a high-LD region) may remove SNPs with independent effects beyond APOE, potentially underestimating non-APOE polygenic signal.
Findings require replication in independent datasets to ensure generalizability.

Related Publications

Explore these studies to deepen your understanding of the subject.

Health and Fitness

Healthy lifestyle practice correlates with decreased obesity prevalence in individuals with high polygenic risk: TMM CommCohort study

Y. Sutoh, T. Hachiya, et al.

Medicine and Health

Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations

N. J. Lennon, L. C. Kottyan, et al.

Psychology

Magical thinking in individuals with high polygenic risk for schizophrenia but no non-affective psychoses—a general population study

A. Saarinen, L. Lyytikäinen, et al.

Psychology

Association of polygenic risk scores and hair cortisol with mental health trajectories during COVID lockdown

K. F. Ahrens, R. J. Neumann, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny