Medicine and Health
Mapping the human genetic architecture of COVID-19
C. H. G. Initiative
SARS-CoV-2 infection produces a wide range of outcomes, from asymptomatic infection to life-threatening pneumonia and acute respiratory distress syndrome. Established risk factors (age, sex, male sex, higher body-mass index) do not fully explain inter-individual variability in COVID-19 severity. Genetic factors may provide biological insights into COVID-19 pathogenesis and reveal therapeutic targets or opportunities for drug repurposing. Prior work has implicated rare loss-of-function variants in type I interferon pathway genes in severe COVID-19, and several genome-wide association studies (GWAS) of common variants have identified robust loci for susceptibility and severity, most notably 3p21.31 for severity. However, the broader genetic basis of susceptibility to infection and progression to severe disease remains incompletely defined. This study addresses these gaps by conducting large-scale international GWAS meta-analyses across multiple COVID-19 phenotypes and ancestries to map host genetic architecture.
Previous studies have identified: (1) rare damaging variants affecting type I interferon responses associated with severe COVID-19; (2) common-variant GWAS loci linked to severity and susceptibility, with the strongest and most reproducible signal for severity at 3p21.31; and (3) overlaps between COVID-19 loci and traits related to lung, autoimmune, and inflammatory diseases. Reports also suggested ABO blood group associations with infection susceptibility, and implicated immune and lung-function pathways. Despite these findings, many causal variants and mechanisms remain unresolved, motivating expanded, diverse meta-analyses and integrative functional prioritization.
Design: The COVID-19 Host Genetics Initiative combined genetic data from 46 studies across 19 countries, totaling up to 49,562 COVID-19 cases and approximately 2 million controls. Participants represented multiple ancestries (European, admixed American, African, Middle Eastern, South Asian, East Asian). Median participant age was ~55 years. Controls were ancestry-matched individuals without known SARS-CoV-2 infection. Phenotypes: Three primary meta-analyses: (1) critically ill COVID-19 (respiratory support in hospital or death); (2) hospitalized COVID-19 (moderate/severe disease requiring hospitalization); (3) all reported SARS-CoV-2 infection (regardless of symptoms). GWAS and Meta-analysis: Study-level association analyses were harmonized and centrally meta-analyzed. Genome-wide significance threshold P < 5×10^-8. Sensitivity and leave-one-out analyses assessed overlap between cohorts (including UK Biobank) and potential biases. Principal components verified ancestry structure. Sample sizes: Critical illness: 6,179 cases, 1,483,750 controls, 16 studies. Hospitalized: 13,641 cases, 2,070,709 controls, 29 studies. Infection: 49,562 cases, 1,770,206 controls, 44 studies. Non-European ancestry proportions were 23%, 29%, and 22% for the three analyses, respectively. Variant characterization and gene prioritization: Fine-mapping and LD analyses were used to define independent loci. Candidate genes prioritized by protein-altering variants in LD, tissue eQTLs (e.g., GTEx v8 lung), variant-to-gene (V2G) evidence, prior trait associations (PheWAS), and functional plausibility. Polygenicity and enrichment: SNP heritability estimated in European-ancestry meta-analyses; tissue-specific expression enrichment assessed. Genetic correlation and Mendelian randomization (MR): Genetic correlations among COVID-19 phenotypes and with 38 selected traits were estimated. Two-sample MR tested potentially causal effects of risk factors (e.g., BMI, smoking, blood cell counts, diabetes) on COVID-19 outcomes, with multiple-testing correction (FDR < 0.05) and sensitivity analyses to assess robustness. Sensitivity to control selection: Analyses evaluated the validity of using population controls lacking confirmed infection status, considering widespread exposure and rarity of severe outcomes.
- Loci discovered: Across analyses, 19 independent loci were associated with COVID-19 phenotypes, with 13 reaching genome-wide significance for infection susceptibility or disease severity.
- Phenotype-specific associations:
- Critical illness: Genome-wide significant loci identified using 6,179 cases vs 1,483,750 controls.
- Hospitalized COVID-19: Nine genome-wide significant loci (5 overlapping with critical illness) from 13,641 cases vs 2,070,709 controls.
- Reported SARS-CoV-2 infection: Seven genome-wide significant loci from 49,562 cases vs 1,770,206 controls.
- Susceptibility vs severity:
- ABO locus and signals within 3p21.31 showed stronger associations with susceptibility to SARS-CoV-2 infection than with progression to severe disease; e.g., rs7271116 more strongly associated with infection (P = 1.79×10^-8; OR ~1.15–1.18) than hospitalization (P = 1.05×10^-3; OR = 1.12 [1.06–1.19]).
- Nine of 13 loci showed significantly greater effect sizes for hospitalized (more severe) vs infection-only phenotypes (8 loci with P < 0.004 for effect-size difference). Effect sizes tended to increase further for critical illness, though power was limited.
- Diversity insights: Lead variants rs1868164 and rs7271116 had higher minor-allele frequencies in South Asian (~15%) and East Asian (~8%) ancestries, respectively, emphasizing discovery gains from diverse cohorts.
- Gene prioritization and cross-trait links:
- TYK2: Risk-increasing alleles associated with critical illness (OR = 1.43 [1.29–1.59], P = 9.71×10^-10) and hospitalization (OR = 1.27 [1.18–1.36], P = 5.05×10^-10); missense variant p.Pro144Ala correlated, consistent with reduced TYK2 function elevating severe COVID-19 risk while being protective for autoimmune diseases (e.g., rheumatoid arthritis OR = 0.74, P = 3.0×10^-15; hypothyroidism OR = 0.84, P = 1.5×10^-10).
- PRF1 (19q13.33): Lead variant rs1078786 associated with infection (OR ≈ 0.93, P = 1.9×10^-10); in LD with missense p.Gly163Ser.
- DPP9 (19p13.3): rs2109969 associated with critical illness; previously linked to interstitial lung disease (OR = 1.29, P = 2.0×10^-15).
- FOXA1 locus: COVID-19 lead variant correlated with lung adenocarcinoma variant (OR = 1.12, P = 6.1×10^-12).
- KANSL1 (17q21.31) and 17q21.31 inversion region implicated; region tied to lung function variation.
- 3p21 region: Multiple independent signals; for a severity lead variant (rs10490717), V2G prioritized CCR6 despite proximity to LZTFL1; susceptibility signal rs2271616 lies within SLC6A20, a functional interactor of ACE2.
- Lung eQTLs support genes at several loci (e.g., PAX5, ABO, OAS1/OAS2/OAS3, IRF2/IL10RB) with COVID-19-associated variants modulating lung expression.
- Polygenic architecture: Heritability for reported SARS-CoV-2 infection showed significant enrichment in genes specifically expressed in lung tissue (P = 5.0×10^-6).
- Genetic correlation and MR:
- High genetic correlations among COVID-19 phenotypes, with lower correlation between hospitalization and infection-only phenotypes.
- Ischemic stroke liability was positively genetically correlated with hospitalization/critical illness but not infection per se.
- MR indicated causal effects of higher BMI on hospitalization (OR = 1.4 [1.3–1.6], P = 8.5×10^-11) and on infection (OR = 1.1 [1.1–1.1], P = 8.9×10^-11), and of smoking on hospitalization (OR = 1.9 [1.3–2.8], P = 0.002). Higher blood-cell count was associated with reduced infection risk (OR = 0.93 [0.89–0.96], P = 5.7×10^-5). Type 1 diabetes showed genetic correlation with COVID-19 outcomes but no MR evidence for causality, suggesting pleiotropy through BMI/type 2 diabetes pathways.
The study systematically maps host genetic contributors to SARS-CoV-2 susceptibility and COVID-19 severity across global cohorts, addressing the central question of which common variants influence infection risk versus disease progression. Findings delineate loci with predominantly susceptibility effects (e.g., ABO, components of 3p21.31) from those driving severity, with effect sizes escalating from infection to hospitalization and further to critical illness. Gene-prioritization points to immunological and pulmonary pathways (e.g., TYK2 signaling, DPP9, OAS gene cluster, IRF2/IL10RB), providing plausible therapeutic targets and mechanistic hypotheses. Genetic correlation and MR analyses integrate epidemiological observations with genetic evidence, supporting causal roles for BMI and smoking in severe COVID-19, while not supporting a direct causal role for type 1 diabetes despite shared genetic architecture. Lung-specific expression enrichment of heritability underscores the biological relevance of pulmonary tissues in susceptibility. Inclusion of diverse ancestries improved discovery power and revealed allele frequency differences that affect signal detectability, highlighting the necessity of broader representation. Sensitivity analyses suggest that population controls can be valid for infectious disease host-genetic discovery when severe outcomes are rare. Collectively, these results refine understanding of the genetic architecture underlying COVID-19 and inform prioritization of pathways for functional follow-up and therapeutic exploration.
This large-scale international GWAS meta-analysis identified 13 genome-wide significant loci for COVID-19 susceptibility and severity, clarified phenotype-specific effects across infection and clinical severity, and implicated immune and lung-related pathways. Integrative analyses provided causal evidence that higher BMI and smoking increase the risk of severe COVID-19, while type 1 diabetes is unlikely to be causal. The work demonstrates the power of collaborative, multi-ancestry genetics to uncover biologically actionable mechanisms. Future research should expand representation of underrepresented ancestries, fine-map loci to pinpoint causal variants, perform functional validation of prioritized genes and pathways (e.g., TYK2, DPP9, OAS cluster, IRF2/IL10RB), and evaluate therapeutic modulation of implicated mechanisms. Ongoing efforts should also integrate evolving viral variants and richer phenotyping to refine host–virus interaction models.
- Meta-analysis heterogeneity: Differences in cohort recruitment, case definitions, and ascertainment may introduce heterogeneity and modifier bias.
- Control selection: Use of population controls without confirmed infection status can bias effect sizes, though sensitivity analyses suggest validity for discovery when severe outcomes are rare.
- Power limitations: The critically ill subgroup had reduced power, making some associations suggestive rather than definitive.
- Causal inference constraints: MR assumptions and pleiotropy can confound causal estimates; some associations did not remain robust across sensitivity tests.
- Fine-mapping and causality: Several loci exhibit complex LD and multiple signals (e.g., 3p21 region), with likely missing causal variants; functional characterization is needed to confirm effector genes and mechanisms.
- Ancestry representation: Non-European populations remain underrepresented, which can limit discovery of ancestry-specific variants and generalizability.
Related Publications
Explore these studies to deepen your understanding of the subject.

