logo
ResearchBunny Logo
Characterizing and Predicting Post-Acute Sequelae of SARS-CoV-2 Infection (PASC) in a Large Academic Medical Center in the US

Medicine and Health

Characterizing and Predicting Post-Acute Sequelae of SARS-CoV-2 Infection (PASC) in a Large Academic Medical Center in the US

L. G. Fritsche, W. Jin, et al.

This groundbreaking study by Lars G Fritsche, Weijia Jin, Andrew J Admon, and Bhramar Mukherjee delves into post-acute sequelae of SARS-CoV-2 infection, revealing a complex interplay of symptoms and disorders that emerge after COVID-19. Utilizing data from over 63,000 patients, the research underscores the potential for effective risk stratification based on pre-existing conditions and acute symptoms.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the burden of post-acute sequelae of SARS-CoV-2 infection (PASC), a heterogeneous condition affecting an estimated 20–40% of individuals after COVID-19. Prior work suggests multiple risk factors (e.g., severe acute illness, female sex, older age, diabetes) and potential biomarkers or electronic health record (EHR)-derived predictors for PASC. Given the novelty of PASC and limited predictive models, the authors aim to identify predisposing diagnoses across pre- and acute-COVID-19 periods via phenome-wide association studies (PheWAS) and to develop phenotype risk scores (PheRS) to predict PASC using a large EHR cohort from Michigan Medicine.
Literature Review
The paper summarizes evidence that COVID-19 vaccination may reduce PASC risk by 13–22%. Reported PASC risk factors include severe acute COVID-19, female gender, older age, and specific acute-phase symptoms (fatigue, headache, hoarse voice). Biomarkers such as immunoglobulin signatures (IgM, IgG3) and ML models incorporating healthcare utilization, age, dyspnea, diagnoses, and medications have been explored. Additional risk associations include type 2 diabetes, persistent SARS-CoV-2 RNA, Epstein-Barr virus reactivation, and specific autoantibodies. Prior studies have characterized long COVID sequelae across organ systems and suggested subtypes, but limited data and evolving diagnostic codes have hindered robust predictive modeling.
Methodology
Design and setting: Retrospective case-control study using EHR data from Michigan Medicine (MM), an academic medical center, including patients with a COVID-19 diagnosis or positive RT-PCR between 10 March 2020 and 31 August 2022. Index date was the first COVID-19 diagnosis or positive test. Inclusion required at least 2 months of follow-up after index. Cases were patients with a recorded PASC diagnosis; controls had no PASC diagnosis. PASC definition: Based on EHR Problem Summary List (PSL) entries for PASC and/or ICD-10-CM codes U09.9 (Post COVID-19 condition, unspecified) or B94.8 (Sequelae of other specified infectious and parasitic diseases). Cases without prior positive test were excluded to ensure temporal definitions of phenome periods. Covariates: Age, gender, race/ethnicity, neighborhood disadvantage index (quartiles), population density (quartiles), vaccination status, Elixhauser comorbidity score (AHRQ), COVID-19 severity (non-severe vs severe [hospitalized/ICU within 1 month or death within 2 months]), healthcare worker status, EHR timespan pre- and post-index, and pre-pandemic records. Complete case analyses assumed covariate missingness completely at random. Phenome construction: ICD-9/10 codes were mapped to 1813 PheCodes using the R PheWAS package. Three time-restricted phenomes relative to index: pre-COVID-19 (to −14 days), acute-COVID-19 (−14 to +28 days), and post-COVID-19 (+28 days to +6 months). Matching: Each PASC case matched to up to 10 controls via nearest-neighbor matching on age at index, pre- and post-index EHR years; exact matching on sex, primary care within 2 years (yes/no), race/ethnicity, and index quarter. Analyses: - Post-COVID-19 PheWAS: Firth bias-corrected logistic regression tested enrichment of PheCodes among PASC cases vs matched controls, adjusting for Elixhauser score, NDI, population density, HCW status, vaccination, and severity. PheCode 136 excluded. Bonferroni correction applied. - Pre-disposing PheCodes: Separate PheWAS for pre-COVID-19 and acute-COVID-19 phenomes using training data (index 2020–2021) to identify associations with PASC; testing data (index 2022) reserved for prediction evaluation. Firth logistic regression with similar covariate adjustments. Sensitivity analyses included stratifications by sex, year (2020, 2021), severity (non-severe, severe), and temporal windows (within 2 years pre-index or pre-pandemic). Bonferroni correction applied. Differences in subgroup effect sizes assessed via t-tests. - PheRS generation: From phenome-wide significant PheCodes, multivariable ridge-penalized logistic regression (glmnet) in training data estimated adjusted coefficients as weights. PheRS calculated as weighted sums for individuals in testing data (PheRS1: pre-COVID-19; PheRS2: acute-COVID-19). Six acute phenotypes observed only in severe cases were excluded from PheRS2 to limit hospital-acquired complications. - PheRS evaluation: Firth logistic regression assessed association of PheRS with PASC, adjusting for covariates. Performance metrics: Nagelkerke’s pseudo-R2, Brier score, and covariate-adjusted AUC (AAUC) via ROCnReg. Combination of PheRS1 and PheRS2 evaluated through logistic model linear predictors. Risk stratification assessed by enrichment of PASC cases in higher PheRS bins vs lower 50%. Analyses conducted in R 4.2.0.
Key Findings
Cohort: Among 63,675 COVID-19-positive patients with ≥2 months follow-up, 1,724 (2.7%) had a recorded PASC diagnosis. PASC prevalence within 3 months ranged from 0.18% (Q3 2020) to 1.8% (Q3 2021), with most cases in Q4 2021. Post-COVID-19 PheWAS (1256 cases vs 12,492 controls): All 29 known PASC symptoms were enriched (OR>1); 27 were phenome-wide significant (p<5.2×10−5). Examples: shortness of breath OR=9.03 (95% CI 7.77–10.50; p=2.94×10−181), malaise/fatigue OR=6.17 (5.33–7.14; p=2.32×10−132), cardiac dysrhythmias OR=2.75 (2.37–3.18; p=3.95×10−41). Additional enrichments included musculoskeletal (e.g., costochondritis OR=6.88 [3.05–14.8]; p=6.72×10−8), infectious (septicemia OR=2.31 [1.66–3.16]; p=2.67×10−7), and digestive disorders (GERD OR=1.72 [1.50–1.99]; p=5.10×10−14). Pre-COVID-19 PheWAS (training: 1212 cases vs 11,919 controls; 1405 PheCodes tested): Seven phenome-wide significant predisposing phenotypes: irritable bowel syndrome OR=1.78 (1.44–2.18; p=4.00×10−8), concussion OR=1.95 (1.51–2.49; p=1.24×10−7), nausea/vomiting OR=1.45 (1.26–1.67; p=2.90×10−7), shortness of breath OR=1.51 (1.29–1.76; p=3.38×10−7), respiratory abnormalities OR=1.39 (1.22–1.59; p=1.10×10−6), allergic reaction to food OR=1.94 (1.42–2.60; p=1.66×10−5), and general circulatory disease OR=1.52 (1.24–1.85; p=3.30×10−5). Sensitivity analyses supported robustness across sex, severity, and time windows. Acute-COVID-19 PheWAS (874 cases vs 8671 controls; 664 PheCodes tested; excluding PASC diagnosed <28 days): Sixty-nine significant phenotypes (p<7.54×10−5) predisposed to PASC, notably respiratory (e.g., shortness of breath, respiratory failure/insufficiency/arrest, oxygen dependence, cough), circulatory (e.g., orthostatic hypotension, hypotension), neurological (sleep disorder, migraine, pain), digestive (GERD, IBS), mental health (anxiety, depression), and systemic symptoms (malaise/fatigue, myalgia/myositis). Shortness of breath effect size differed between 2020 (OR=2.20 [1.60–2.99]) and 2021 (OR=4.59 [3.62–5.81]; p-difference=0.000234), both significant. Cross-period comparison: Nearly all predisposing phenotypes from pre- and acute periods were also enriched post-COVID, suggesting persistence or chronicity. Prediction performance: In 2022 testing data (with ≥28 days to PASC): PheRS1 AAUC=0.555 (0.496–0.612) or 0.548 (0.516–0.580 in full set); PheRS2 AAUC=0.605 (0.549–0.663); combined AAUC=0.615 (0.561–0.670). Pseudo-R2: PheRS1=0.0116; PheRS2=0.0547. Although individual-level discrimination was modest, risk stratification showed enrichment of PASC in higher-risk bins. Combined PheRS identified the top 25% at 3.48-fold higher risk (95% CI 2.19–5.55) vs bottom 50%.
Discussion
The study identifies pre- and acute-COVID-19 phenotypes that predispose to PASC using a time-resolved PheWAS framework, aligning with and extending prior literature on respiratory, circulatory, neurological, and systemic contributors. The overlap of predisposing and post-COVID-19 enriched phenotypes suggests that some acute or pre-existing conditions may evolve into long-term sequelae. While PheRS-based prediction achieved limited individual-level discrimination (AAUC <0.7), the scores effectively stratified risk, highlighting potential clinical utility for identifying higher-risk groups for monitoring or early intervention. Variability in feature distributions across pandemic waves, vaccination effects, and evolving clinical awareness likely contributed to reduced predictive accuracy. The findings underscore the complexity of PASC and the need for improved case definitions, larger datasets, and incorporation of richer EHR features to refine prediction and explore PASC subtypes.
Conclusion
PASC is a significant public health challenge. Using EHR-based PheWAS in a large health system cohort, the study identified known and potentially novel predisposing diagnoses in pre- and acute-COVID-19 periods. Aggregated into PheRSs, these features offer modest predictive performance but meaningful risk stratification to identify vulnerable subgroups. Future work should apply advanced machine learning, incorporate additional data modalities (laboratory, medications), refine PASC phenotyping (including subtypes), and extend the PheRS framework to alternative outcomes such as survival to better understand and manage long-term COVID-19 consequences.
Limitations
Key limitations include: matching and adjustment on demographic factors (age, sex, race/ethnicity) precluded evaluating their predictive contributions, though they may be true risk factors; evolving and non-specific PASC definitions and delayed adoption of ICD-10 code U09.9 may have led to underdiagnosis (observed 2.7% vs literature estimates of 19–35%); reliance on a single health system cohort introduces selection bias (patients may be older/less healthy) and underrepresentation of asymptomatic infections; limited sample size of severe cases and potential hospital-acquired complications; EHR-based measurement and coding limitations; temporal changes in variants, vaccination, and care affecting model transportability. These factors may reduce generalizability and attenuate predictive performance.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny