
Medicine and Health
Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare
K. H. Goh, L. Wang, et al.
Sepsis is a critical condition that can lead to death, but the newly developed SERA algorithm offers hope! Created by a team of researchers including Kim Huat Goh and Le Wang, this AI-driven tool predicts and diagnoses sepsis with impressive accuracy, utilizing both structured data and unstructured clinical notes. Early detection could increase by up to 32% and reduce false positives, paving the way for better patient outcomes.
~3 min • Beginner • English
Introduction
Sepsis is a leading cause of in-hospital mortality in the United States, accounting for roughly half of hospital deaths. Early identification is crucial because timely interventions such as fluid resuscitation within 3 hours and antibiotics ideally within 1 hour are associated with improved outcomes, and delays increase mortality. However, early diagnosis is challenging due to symptom overlap with less critical conditions and operational constraints in care delivery. Most existing sepsis prediction approaches rely primarily on structured EHR data, despite estimates that about 80% of EMR data are unstructured (e.g., free-text clinical notes, images), which often contain rich clinical insights not captured in structured fields. Prior work has incorporated text mining of notes to improve early prediction, typically focusing on extracting concepts or keywords. The present study advances this by leveraging topic modeling of clinical notes—arguing that topic-level representations are more stable and generalizable than individual words—and combining them with structured variables to improve both diagnosis and early prediction of sepsis up to 48 hours before onset. The authors develop the topic-based, NLP-enabled SERA algorithm to classify current sepsis status and, if not present, to predict risk at 4, 6, 12, 24, and 48 hours ahead. They evaluate it in a realistic prevalence setting and compare performance against physicians and standard scoring systems, with the goal of enabling earlier, more accurate detection and potentially reducing mortality.
Literature Review
Prior research on sepsis prediction has largely utilized structured EHR data, with performance improvements reported when incorporating unstructured text via NLP. Existing NLP applications in clinical notes have focused on identifying/extracting medical events, medications, and workflows, often using word-level features. Studies that augmented models with text generally used common words from notes; however, word-level features can be unstable across clinicians due to stylistic differences. Topic modeling (e.g., LDA) offers a more robust representation by capturing higher-level, lexicographic themes that may generalize better. Reported AUCs for prior sepsis prediction models have typically ranged between 0.84 and 0.92 for diagnosis tasks, and standard clinician-used scoring systems (SIRS, SOFA, MEWS, qSOFA) often show modest predictive performance (AUCs around 0.50–0.78) near the time of onset. This study builds on that literature by employing topic-based representations from clinical notes combined with structured variables to enhance predictive accuracy and extend lead times up to 48 hours.
Methodology
Study setting and cohort: Retrospective analysis of patients admitted to a Singapore government-based hospital. Sepsis cases were identified using ICD-10 codes consistent with hospital practice; patients diagnosed as having sepsis are transferred to ICU, while others comprise the non-sepsis cohort. There were 482 sepsis patients in the training/validation sets and 287 sepsis patients in the test sample. Sepsis onset time was defined as the hospital’s ICD ward admission time for sepsis (as per local practice). At the encounter level, sepsis prevalence was 6.15%, consistent with typical hospital prevalence (~6%). Due to class imbalance, SMOTE oversampling was applied for some model variants; corresponding non-oversampled (low-prevalence) models were also developed and evaluated.
Predictor variables: Structured variables included patient information (age, gender), vital signs (blood pressure, heart rate, temperature, oxygen saturation, respiratory rate), investigations (total white cell count, culture results, lactate, high-sensitivity C-reactive protein, procalcitonin, arterial blood gas), and treatments (use of vasopressors, use of antibiotics). Culture types encompassed a wide range of specimen and pathogen tests per hospital practice.
Unstructured data processing (clinical notes): Free-text clinical notes were preprocessed (HIPAA-compliant de-identification and standard NLP preprocessing). Latent Dirichlet Allocation (LDA) topic modeling was applied to progress notes, yielding 100 topics categorized into seven groups: admission, communication, laboratory tests, non-clinical status, social relationships, symptoms, and treatment. Topic loadings (numerical weights) per note were combined with the structured variables to form the predictor set for both diagnosis and early prediction tasks.
Model architecture (SERA): The SERA framework comprises two linked components: (1) a diagnosis algorithm that determines whether a patient has sepsis at the time of consultation; and (2) an early prediction algorithm that estimates risk of developing sepsis within the next 4, 6, 12, 24, and 48 hours for those not currently diagnosed. The primary estimator is a voting ensemble that averages probabilities from two base classifiers: stochastic gradient descent (SGD)-based logistic regression and random forest. Alternative estimators (for comparison) included DAGging and gradient boosted trees (GBT). Model development followed standard training/validation with subsequent evaluation on an independent, hold-out test set. Both SMOTE-oversampled and non-SMOTE (realistic low-prevalence) conditions were assessed.
Evaluation: Performance metrics included AUC, sensitivity, specificity, PPV, and NPV for diagnosis and for early prediction at each lead time. Additional analyses compared SERA against physicians’ implicit predictions (proxied by concurrent orders of lactate and cultures per sepsis guidelines) across time windows (48, 24, 12, 6, 4 hours before onset), as well as against standard scoring systems (SIRS, SOFA, MEWS, qSOFA). Ablation-like comparisons assessed the incremental value of adding clinical text to structured variables, especially at different lead times (4–48 hours). Simulations examined how varying sepsis prevalence affects PPV for potential clinical deployment. Operational workflows were proposed for background periodic scoring (e.g., during handovers or hourly) and ad-hoc scoring triggered by note updates. All ensemble modeling was implemented on the ENMIAC Analytics Platform v4.1.6.
Key Findings
- Diagnosis performance (SMOTE test set): AUC 0.94, sensitivity 0.89, specificity 0.87, PPV 0.85, NPV 0.90.
- Early prediction performance (SMOTE test set):
- 48 h: AUC 0.87, sensitivity 0.78, specificity 0.77, PPV 0.78, NPV 0.83.
- 24 h: AUC 0.90, sensitivity 0.81, specificity 0.80, PPV 0.80, NPV 0.86.
- 12 h: AUC 0.94, sensitivity 0.87, specificity 0.87, PPV 0.87, NPV 0.92.
- 6 h: AUC 0.92, sensitivity 0.88, specificity 0.81, PPV 0.82, NPV 0.90.
- 4 h: AUC 0.92, sensitivity 0.86, specificity 0.80, PPV 0.81, NPV 0.85.
- Compared with physicians (independent test sample):
- SERA achieved higher true positive rates (TPR) than physicians across all lead times (48–4 h), improving early detection by 21–32% (absolute TPR gain 0.21–0.32).
- SERA had lower false positive rates (FPR): reductions of 0.07–0.17 compared with physicians (FPR physicians ~0.34 to 0.27 from 48 h to 4 h; SERA FPR ~0.23 at 48 h improving to ~0.10 at 12 h).
- SERA generally outperformed standard scoring systems (SIRS, SOFA, MEWS, qSOFA) in ROC comparisons near the 4 h pre-onset window.
- Added value of unstructured text: Incorporating clinical notes provided marginal gains for diagnosis and very short lead times (≤6 h) where structured variables reflect overt symptoms, but yielded substantial improvements for earlier prediction (12–48 h). In the 12–18 h window, adding text improved AUC by 0.10–0.15, sensitivity by 0.07–0.13, and specificity by 0.08–0.14.
- Low-prevalence environment (non-SMOTE): Models maintained high sensitivity and very high NPV but, as expected, exhibited low PPV due to the low base rate. Simulations demonstrated that PPV increases with higher prevalence, informing deployment expectations across settings.
Discussion
The SERA algorithm, which integrates topic-modeled clinical notes with structured EHR variables, addresses a key challenge in early sepsis detection: subtle pre-onset signals that may not yet be reflected in vitals or labs. By leveraging unstructured text, SERA captures clinicians’ qualitative assessments and nuanced patient context, providing meaningful predictive gains particularly at 12–48 hours pre-onset. The strong diagnostic and early prediction performance (AUC up to 0.94 at 12 h) and improved TPR/FPR profiles relative to physicians and standard scoring systems suggest that SERA can serve as an effective early warning tool. Given the time-sensitive nature of sepsis care, such lead-time improvements could support earlier intervention, potentially reducing mortality associated with delays in antibiotics and resuscitation. The analysis further clarifies that while structured data suffice closer to onset when symptoms are overt, unstructured notes add critical value further upstream. Proposed clinical workflows (background periodic scoring and ad-hoc triggers after note updates) outline feasible integration paths to augment clinicians’ situational awareness without disrupting care processes.
Conclusion
This study introduces SERA, a topic-based NLP-enabled ensemble algorithm that combines structured EHR data with unstructured clinical notes to diagnose sepsis and predict risk up to 48 hours before onset. SERA demonstrates high accuracy, outperforming physicians and common scoring systems, and shows that unstructured clinical text substantially enhances predictive performance for earlier time horizons (12–48 h). The approach offers practical deployment options within EMR systems to provide timely alerts and decision support. Future work could validate generalizability across multiple hospitals and health systems, refine topic models with domain-adapted NLP advancements, and prospectively evaluate clinical impact on workflow, treatment timing, and patient outcomes.
Limitations
- Single-hospital study in Singapore, which may limit generalizability to other institutions or healthcare systems.
- Retrospective design; prospective validation and impact assessment were not reported.
- Low-prevalence settings naturally yield low PPV despite high sensitivity/NPV, affecting positive alert precision.
- Data-sharing constraints (privacy regulations) limit external replication with the original dataset.
- Sepsis labeling and onset timing are based on hospital practice and ICD coding/admission timing, which may introduce labeling or timing uncertainties.
Related Publications
Explore these studies to deepen your understanding of the subject.