The global COVID-19 pandemic has resulted in a substantial number of survivors experiencing post-acute sequelae of SARS-CoV-2 infection (PASC), also known as long COVID. Estimates suggest that 20-40% of COVID-19 patients develop PASC, encompassing a wide range of persistent symptoms, new chronic disorders, and late complications. These include persistent symptoms like cough and fatigue, new chronic conditions such as lung or neurological disease, and autoimmune complications. While COVID-19 vaccination may reduce PASC risk by 13-22%, the continued prevalence of PASC poses a significant burden on healthcare systems. Several demographic factors, pre-existing conditions, and biomarkers have been linked to increased PASC risk, such as severe acute COVID-19, female gender, older age, pre-existing diabetes, and specific acute symptoms. However, the novelty of PASC and limited research have hindered the development of accurate risk prediction models. This study aimed to address this gap by identifying predisposing diagnoses of PASC through phenome-wide association studies (PheWAS) and developing phenotype risk scores (PheRS) to predict PASC. Leveraging a large cohort of over 60,000 patients with a history of COVID-19 from Michigan Medicine, this study utilized rich retrospective electronic health record (EHR) data, including socioeconomic status, demographics, and other relevant variables, to investigate PASC risk factors and improve risk stratification.
Literature Review
Previous studies have identified various demographic factors, pre-existing conditions, and biomarkers associated with PASC. Severe acute COVID-19, female gender, older age, and pre-existing diabetes have been reported to increase the risk. Specific symptoms during the acute phase, such as fatigue and headache, have also been linked to increased PASC risk. Some studies have explored potential biomarkers, such as immunoglobulin signatures, to predict PASC. Other studies have utilized machine learning approaches and identified combinations of factors, including healthcare utilization rates, patient age, and specific diagnoses, for PASC prediction. However, these studies highlight the need for further research to understand the complex interplay of risk factors and to develop more accurate and reliable risk prediction models. The existing literature highlights the heterogeneity of PASC and the need for comprehensive approaches to characterize its risk factors.
Methodology
This study employed a case-control design using data from Michigan Medicine (MM) patients with a recorded COVID-19 diagnosis or positive SARS-CoV-2 RT-PCR test between March 10, 2020, and August 31, 2022. The index date was defined as the date of the first COVID-19 diagnosis or positive test. Patients with encounters at least two months post-index date were included, stratified into PASC cases (with a recorded PASC diagnosis) and controls (without). PASC diagnoses were identified using ICD-10-CM codes (U09.9 or B94.8) or entries in the EHR's Problem Summary List. Patients without a prior positive test were excluded. Demographic, socioeconomic, and clinical covariates were collected, including age, gender, race/ethnicity, neighborhood disadvantage index, population density, vaccination status, Elixhauser comorbidity score, COVID-19 severity, healthcare worker status, and EHR record timespans. Complete case analysis was used for adjusted analyses. Each subject's medical phenome was constructed by extracting ICD codes and mapping them to PheCodes using the R package "PheWAS." Three time-restricted phenomes were created: pre-COVID-19 (–14 days), acute-COVID-19 (–14 to +28 days), and post-COVID-19 (+28 days to +6 months). Matching was performed using the R package "MatchIt" to minimize confounding between PASC cases and controls. PheWAS was conducted using Firth bias-corrected logistic regression to identify phenotypes associated with PASC in each time period. Sensitivity analyses were performed to assess the robustness of results. Phenotype risk scores (PheRSs) were generated using ridge penalized logistic regression for pre- and acute-COVID-19 periods. PheRS performance was evaluated using Nagelkerke's pseudo-R2, Brier score, and area under the covariate-adjusted receiver operating characteristic (AROC) curve. Analyses were performed using R 4.2.0.
Key Findings
Among 63,675 patients with a history of COVID-19, 1724 (2.7%) had a PASC diagnosis. PASC prevalence within three months of infection ranged from 0.18% to 1.8%. The post-COVID-19 PheWAS identified enrichment of all 29 PASC symptoms among cases, with 27 reaching phenome-wide significance. Additional enriched diagnoses included musculoskeletal, infectious, and digestive disorders. The pre-COVID-19 PheWAS identified seven significantly associated phenotypes, including irritable bowel syndrome (IBS), concussion, and nausea/vomiting. The acute-COVID-19 PheWAS identified 69 significantly associated phenotypes, predominantly respiratory and circulatory symptoms. Sensitivity analyses largely supported these findings. Both pre- and acute-COVID-19 PheRSs showed some discriminatory ability but low accuracy in predicting individual-level PASC risk (AAUC < 0.7). However, combined PheRSs identified a quarter of the cohort with a history of COVID-19 with a 3.5-fold increased risk for PASC compared to the bottom 50%. This suggests clinical utility for risk stratification and identification of vulnerable individuals.
Discussion
This study provides valuable insights into the complex nature of PASC by identifying predisposing and presenting features across different time periods. The findings confirm previously reported associations between pre-existing conditions and PASC risk, such as respiratory symptoms and circulatory diseases. The identification of novel associations, such as IBS and concussion as pre-disposing factors, warrants further investigation. The relatively low accuracy of PheRSs for individual-level prediction highlights the inherent challenges in predicting PASC, potentially due to factors like the heterogeneity of PASC, evolving definitions of PASC, and the limitations of EHR data. The observation that combined PheRSs could effectively stratify risk, however, suggests that these scores may still be valuable tools for identifying high-risk individuals. This identification of a substantial group of individuals at higher risk allows for targeted interventions and protective measures, potentially mitigating PASC burden.
Conclusion
This study identified known and potentially novel pre-disposing conditions for PASC using a PheWAS approach. While the PheRSs demonstrated limited accuracy for individual-level prediction, they show promise for risk stratification. Future research should explore more sophisticated predictive models, incorporate additional data types, and investigate the causal relationships between identified risk factors and PASC development. This could significantly contribute to earlier detection and intervention for PASC.
Limitations
The study's limitations include the potential for underdiagnosis of PASC due to evolving diagnostic criteria and awareness, selection bias due to focusing on patients seen at Michigan Medicine, and the inherent limitations of EHR data. The relatively low prevalence of PASC in the cohort might have impacted the predictive performance of the models. Further studies with larger, more diverse cohorts and incorporating additional data such as laboratory results and medication information are needed to improve the prediction and understanding of PASC.
Related Publications
Explore these studies to deepen your understanding of the subject.