logo
ResearchBunny Logo
Genetic determinants and phenotypic consequences of blood T-cell proportions in 207,000 diverse individuals

Medicine and Health

Genetic determinants and phenotypic consequences of blood T-cell proportions in 207,000 diverse individuals

H. Poisner, A. Faucon, et al.

This study, conducted by Hannah Poisner, Annika Faucon, Nancy Cox, and Alexander G. Bick, delves into the intricate genetic regulation of T-cells, uncovering 27 loci linked to T-cell abundance across varied ancestries. Their findings reveal significant connections between T-cell fractions and health outcomes, particularly respiratory diseases.

00:00
00:00
~3 min • Beginner • English
Introduction
White blood cells, including T-cells, B-cells, and natural killer cells, are central to immune function and disease. While clinical complete blood counts quantify total leukocytes and some subtypes, they do not typically distinguish T-cells, B-cells, and NK cells outside specific contexts (e.g., CD4+ counts in HIV). Consequently, large genetic studies of blood cell traits have not characterized T-cell–specific genetic architecture, despite evidence of ancestry-related differences in blood traits and immune responses. Recent work demonstrated that sequencing read-depth over the V(D)J recombination region of the T-cell receptor alpha (TRA) locus can estimate T-cell fraction from sequencing data. To address the gap in understanding genetic determinants and phenotypic consequences of T-cell proportions, the authors estimated T-cell fraction from WGS in >200,000 multi-ancestry individuals across TOPMed and All of Us, followed by genetic association and phenotypic analyses to characterize regulation of T-cell fraction and its clinical correlates.
Literature Review
Prior large-scale GWAS of blood cell counts have identified numerous loci and ancestry-specific associations but do not isolate T-cell abundance due to lack of routine T-cell–specific measurements. Studies have noted inter-ancestry differences in leukocyte traits and immune-related genetic variants. Bentham et al. showed that read-depth signals at the TRA locus, reflecting T-cell receptor excision circle (TREC) loss from V(D)J recombination, can estimate T-cell fraction from sequencing, suggesting a path to derive T-cell–related measures at scale. Additional prior work linked blood cell traits to variants with different frequencies across ancestries and highlighted the importance of including diverse populations in genetic discovery.
Methodology
Study cohorts: WGS was obtained from TOPMed (Freeze 9; 109,619 individuals; >51 studies; GRCh38 alignment) and All of Us. Age at blood draw and sex were available for 86,107 TOPMed participants. Genetic ancestry was inferred using RFMix with HGDP references in TOPMed and via a classifier trained on HGDP and 1000 Genomes in All of Us, projecting samples into PCA space and assigning super-population labels. T-cell fraction estimation: WGS T-cell fraction was estimated with a two-step pipeline: (1) per-base allelic depth across the TRA gene using samtools depth (v1.14), and (2) T-cell ExTRACT (v1.0.1), which exploits modified read-depth due to TREC loss at the TRA locus from V(D)J recombination. The approach was validated by comparing exome capture kits and by assessing coverage across 90 random genes to exclude coverage artifacts; mean TRA region coverage was ~32–53x, adequate for detecting depth variation. Validation across exome kits: In 100 randomly selected individuals, Pearson correlations of T-cell fraction estimates across capture kits were strong (Skillet/NimbleGen r = 0.84, p < 5.2 × 10^-29; Clinical Twist Exome r = 0.87, p < 2.43 × 10^-15), supporting robustness of estimation. Epidemiologic correlates: Ordinary least squares models tested associations of T-cell fraction with age and sex, and with genetic ancestry proportions/PCs. Replication was performed in All of Us. Blood traits: In TOPMed, 14 harmonized blood traits (e.g., lymphocyte count, WBC, neutrophils, RBC indices, platelets) were available for up to 37,393 samples. In All of Us, 11 lab traits were extracted and correlated with WGS-derived T-cell fraction. Single-variant GWAS: In TOPMed Freeze 10, variants with MAF > 0.1% were analyzed using SAIGE-QT linear mixed models, adjusting for age, sex, and 10 ancestry PCs (overall; EUR- and AFR-majority subsets also tested). Extensive variant QC filters were applied (depth, missingness, centromeric regions, duplicate discordance, parental inconsistencies, heterozygosity, HWE with specified exceptions, and additional classifier filtering). In All of Us, Regenie v2 linear mixed models were used, with QC including HWE filters, genotype quality, and MAC thresholds; analyses adjusted for 10 ancestry PCs and included broadly multi-ancestry samples (N ≈ 95,951). Meta-analysis: Fixed-effects meta-analysis using PLINK v1.9 combined GWAS summary statistics across TOPMed and All of Us; regression betas reported. Rare-variant analysis: SKAT-O was used in TOPMed with age, sex, dosage of rs2187478, and PCs as covariates. Regenie v3.2 (Docker) handled step 1 (500k very common variants) and step 2 tests restricted to coding variants with MAF < 0.01 and functional annotations (nonsynonymous, stop-gain/loss, splicing, exonic). SNP-heritability: SNP-based heritability was estimated using the BLD-LDAK model (LDAK software), stratifying by ancestry (EUR and AFR) and using ancestry-specific GWAS summary statistics; variant sets were mapped to GRCh37 and matched to pre-computed LD tagging references. Additional analyses excluded a 3 Mb region around a key locus to assess locus-specific heritability contribution. Fine-mapping and functional follow-up: A Maller et al.–style approach computed approximate Bayes factors (aBFs) and posterior inclusion probabilities (PIPs) in ±250 kb windows around lead variants to define 95% credible sets. Putative causal genes were prioritized by overlapping credible-set variants with GeneHancer enhancers/promoters, chromatin accessibility/regulatory annotations, and ABC enhancer predictions in T-cell lines. Conditional analysis: Dosages (0/1/2) of lead variant effect alleles from 27 loci were included as covariates in cohort-specific GWAS, followed by meta-analysis, to identify secondary independent signals. Polygenic score and LabWAS: Using TOPMed European-ancestry GWAS summary statistics, a polygenic score for T-cell fraction (PGSCT) was derived with PolygenicRiskScores.jl (PRS-CS auto implementation) and applied to 72,828 BioVU participants of recent European ancestry (genotyped, imputed, QC-filtered). LabHawk performed linear regressions of PGS vs. median INT-transformed lab values (N ≥ 50), adjusting for age, sex, BMI; Bonferroni thresholds were applied. EHR-based PheWAS of T-cell fraction: Among 69,409 sequenced individuals with ICD codes, ICD9/10 were mapped to phecodes; individuals with <5 phenotypes and phecodes present in <500 individuals were excluded. Logistic regression tested associations between rank inverse normalized T-cell fraction and phecodes (covariates: age, sex, site, 10 PCs). Phenotype clustering used agglomerative clustering with OLS regression models per cluster. Pregnancy analysis: In All of Us, 1,295 WGS individuals were identified as pregnant at blood draw; 694 had delivery dates within 40 weeks of draw. Propensity score matching (PsmPy) yielded 850 pregnant and 659 non-pregnant matched controls. T-cell fraction trajectories were plotted by weeks prior to delivery, and compared to matched controls (binning per data-sharing policies). Statistical environment: Analyses used Python (NumPy, sklearn, statsmodels) and plotting libraries (Seaborn, SciPy, Matplotlib, Pandas) with specified versions for TOPMed and All of Us pipelines. Data and code availability are provided via NHLI BioData Catalyst, All of Us workspaces, Zenodo, and GitHub.
Key Findings
- Robust estimation of T-cell fraction from WGS: Extension of T-cell ExTRACT to WGS showed strong cross-platform concordance across exome capture kits (r = 0.84 and r = 0.87; p < 5.2 × 10^-29 and p < 2.43 × 10^-15, respectively). Coverage analyses across TRA and 90 random genes indicated adequate depth and minimized bias from sequencing artifacts. - Demographic and ancestry correlates: T-cell fraction was significantly associated with sex and age in TOPMed (sex p < 1 × 10^-6; age p < 1.59 × 10^-6) and replicated in All of Us (sex p < 1 × 10^-6; age p < 3.98 × 10^-6), with higher T-cell fractions in females and an age-related decline. Genetic ancestry and ancestry PCs were significantly associated with T-cell fraction; in ages 50–80, Admixed American and East Asian ancestries were also significant. - GWAS discoveries: In TOPMed, 1,453 genome-wide significant variants (p < 5 × 10^-8) were associated with T-cell fraction. In All of Us, 8,059 genome-wide significant variants across 19 loci were identified; 8 loci replicated at genome-wide significance and 2 at nominal significance, and a novel locus not genome-wide significant in TOPMed was found. Meta-analysis across cohorts identified 27 unique loci overall. Many lead variants mapped to genes implicated in blood cell traits (e.g., CSF3R, IL7R, HLA-B, IL7, IRF8, JUN, CD55, ACKR1/Duffy, APOE, KLF3). - Heritability: SNP-based heritability differed by ancestry, with estimates up to ~42% in AFR ancestry (SD 0.04) and ~10% in EUR ancestry (SD 0.02). In AFR individuals, chromosome 1 variants accounted for up to 33% of T-cell fraction heritability; a 3 Mb region around a key chr1 locus (near rs214778) accounted for ~18% of heritability. - Post-GWAS insights: Fine-mapping highlighted 14 putatively causal variants. Functional enrichment pointed to pathways in T-cell development, viability, proliferation, and apoptosis (e.g., IL6, KLF2, CD69, BFM/BIM), and to myeloid proliferation genes (CSF3/CSF3R) likely affecting T-cell fraction indirectly via lineage balance. - Polygenic score associations: In BioVU LabWAS, a T-cell fraction PGS showed significant associations with eight immune markers, two blood markers, and one metabolic marker. Effects were directionally consistent with hematopoietic lineage balance: positive with neutrophil measurements and negative with lymphocyte-related measurements, reflecting trade-offs between lymphoid and myeloid differentiation. - Pregnancy dynamics: Among All of Us participants, pregnancy was associated with a markedly lower T-cell fraction compared to matched controls: overall ~30% lower; ~33% lower in the first trimester; progressively decreasing through the second trimester to a nadir of ~43% lower; partial recovery to ~27% lower by mid–third trimester, persisting until delivery. - Phenome-wide associations: EHR-based analyses showed clinical phenotypes associated with measured T-cell fraction, with enrichment of respiratory disease categories among those with notable T-cell proportion changes.
Discussion
The study addresses the lack of scalable T-cell–specific measurements by deriving T-cell fraction directly from WGS read-depth at the TRA V(D)J region and demonstrates that this approach is robust across platforms and cohorts. The genetic architecture of T-cell fraction is substantial and population-dependent, with higher SNP-heritability in AFR ancestry and major contributions from chromosome 1, including a region near ACKR1/Duffy. Discovery and replication across diverse cohorts yielded 27 loci, many near genes previously implicated in hematopoiesis and immune regulation, supporting both direct T-cell biology and indirect effects via myeloid lineage proliferation. The PGS-based LabWAS linked genetic predisposition to T-cell fraction with multiple immune and hematologic laboratory traits in directions consistent with lineage trade-offs. Epidemiological analyses confirmed higher T-cell fractions in females and age-related declines, and the pregnancy analysis revealed dynamic, trimester-specific reductions in T-cell fraction, suggesting influences of hormonal or physiological changes (e.g., estrogen dynamics) on hematopoietic lineage bias. Inclusion of diverse ancestries uncovered ancestry-specific associations (e.g., SHH, COL4A3, MYLP, DARS) with lead variants more prevalent in individuals of African ancestry, highlighting the value of diversity for genetic discovery. Overall, the findings elucidate genetic and phenotypic determinants of T-cell fraction at population scale and establish a framework for leveraging WGS-derived immune cell proportions to study disease associations.
Conclusion
This work demonstrates that T-cell fraction can be accurately quantified at scale from WGS, enabling the first large study of the genetic architecture of T-cell proportions across diverse populations. The authors identified 27 loci, provided evidence of substantial and ancestry-dependent SNP-heritability, and showed that both individual variants and polygenic burden relate to immune and hematologic traits and clinical phenotypes. The study revealed sex, age, ancestry, and pregnancy effects on T-cell fraction, offering biological insights into immune regulation and hematopoietic lineage balance. Future research should extend similar approaches to quantify other lymphocyte populations (e.g., B-cells), incorporate structural variants and STRs to capture missing heritability, validate WGS-based estimates more extensively, and broaden representation of under-studied ancestries (e.g., East and South East Asian) to refine genetic architecture and disease links.
Limitations
- Validation: Although prior work validated ExTRACT in exome data, extensive validation in WGS is limited; current findings rely on internal concordance analyses and cross-cohort replication. - Variant scope: Analyses focused on single nucleotide variants and small indels; structural variants and short tandem repeats (STRs) were not comprehensively assessed and may explain additional heritability. - Ancestry representation: Despite diversity, East and South East Asian ancestries remain underrepresented, and samples skew older, limiting generalizability. - Ancestry classification: A simplified majority-ancestry assignment in TOPMed may introduce confounding in ancestry-specific GWAS and heritability estimates. - Potential methodological inconsistencies: Differences in pipelines (e.g., SAIGE vs. Regenie) and QC thresholds across cohorts could affect cross-cohort comparability.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny