logo
ResearchBunny Logo
Interpretable machine learning-based decision support for prediction of antibiotic resistance for complicated urinary tract infections

Medicine and Health

Interpretable machine learning-based decision support for prediction of antibiotic resistance for complicated urinary tract infections

J. Yang, D. W. Eyre, et al.

Discover how a collaborative team of researchers, including Jenny Yang and David W. Eyre, has developed innovative machine learning algorithms that predict antibiotic resistance in urinary tract infections. Their work not only enhances treatment efficacy but also promotes personalized care through interpretability in model design.

00:00
00:00
~3 min • Beginner • English
Introduction
Antimicrobial resistance (AMR) is rising rapidly, threatening the effectiveness of many antibiotic therapies and increasing the risk of treatment failure. New antibiotic development is limited by costs and regulatory hurdles, while reliance on broad-spectrum therapies can further select for resistance. Clinicians must align empirical treatment with pathogen susceptibilities, but culture results are delayed by days. This study presents interpretable machine learning (ML) methods to predict antibiotic resistance at the start of care to reduce non-susceptible treatments and enable faster, more effective interventions. The work focuses on urinary tract infections (UTIs)—common globally with high recurrence and significant resistance rates—where empirical treatment without susceptibility insight can lead to mismatched therapy. The primary objective is to predict antibiotic resistance for potentially complicated UTIs (more severe and/or with anatomic abnormalities or significant comorbidities), emphasizing interpretable models suitable for clinical decision support. A secondary objective is to assess generalizability to an independent cohort of uncomplicated UTI specimens.
Literature Review
Prior ML studies using EHR data have shown promise for predicting UTI antibiotic resistance. Yelin et al. demonstrated logistic regression and gradient-boosted trees predicting resistance to six antibiotics (AUROC 0.70–0.83) using demographics, microbiology history, and antibiotic purchase history, and showed reduced mismatched treatments via algorithmic recommendations. However, their models did not incorporate clinical comorbidities and hospitalizations. Kanjilal et al. trained models on an uncomplicated UTI cohort (15,806 specimens; females aged 18–55 with exclusion criteria) and achieved AUROCs of 0.56–0.64 across four antibiotics, outperforming clinicians and reducing recommendations of broad-spectrum second-line agents by 67%. Neither study evaluated neural network architectures for tabular EHR data. Although deep neural networks often excel in image/text tasks, tree-based ensembles typically outperform them on tabular data due to inductive biases and interpretability. Recent attention-based models such as TabNet introduce sequential attention over features for improved performance and interpretability on tabular data. This study builds on prior work by targeting potentially complicated UTIs and comparing interpretable models, including logistic regression, XGBoost, and TabNet with and without self-supervised pretraining.
Methodology
Data source and cohort: The AMR-UTI dataset (PhysioNet) includes >80,000 patients with UTIs presenting to Massachusetts General Hospital (MGH) and Brigham & Women’s Hospital (BWH) from 2007–2016 (IRB-approved with consent waived). The analysis focuses on potentially complicated UTI specimens (101,096 samples), including all specimens tested for first-line agents nitrofurantoin (NIT) and co-trimoxazole (SXT) and second-line agents ciprofloxacin (CIP) and levofloxacin (LVX). For generalizability, models are also evaluated on an independent uncomplicated UTI cohort (15,806 specimens) per Kanjilal et al. Features and labels: Data include antimicrobial susceptibility profiles (pathogen identity; MIC/disk diffusion converted to CLSI 2017 categorical phenotypes S/I/R; intermediate treated as resistant), demographics (age, race/ethnicity as white/non-white), prior antibiotic resistance and exposures, prior infecting organisms, comorbidities, care location at specimen collection (inpatient, outpatient, ER, ICU), colonization pressure (proportion resistant to each agent over 7–90 days prior at multiple hospital/location hierarchies), prior skilled nursing facility visits, infections at other sites on the same day, prior procedures, and basic labs. Prior exposures/resistance/organisms/comorbidities/hospitalizations were tabulated over 14, 30, 90, and 180-day windows pre-collection. Categorical variables were one-hot encoded; total 787 features. Missingness mainly implied absence (binary features set to 0 when not observed). The dataset lacks dosage/duration of antibiotics, urinalysis, allergies, or encounters outside MGH/BWH; empiric prescriptions for complicated UTIs were not available. Patients with asymptomatic bacteriuria could not be explicitly excluded. Train/validation/test split: Temporal evaluation was used: train on specimens from 2007–2013, test on 2014–2016. From training data, 90% was used for model development/hyperparameter tuning and 10% for continuous validation and threshold selection. Separate models were trained per antibiotic; specimen counts differ by availability of susceptibility results. Models: Interpretable models included logistic regression (LR), XGBoost (gradient-boosted decision trees), and TabNet. TabNet was trained in two modes: standard supervised learning and self-supervised pretraining (TabNetself) via reconstructive pretext tasks predicting missing columns to learn improved representations prior to supervised fine-tuning. Evaluation: Metrics reported were AUROC, AUPRC, sensitivity, specificity, PPV, and F1-score, each with 95% confidence intervals from 1000 bootstrap samples. Model comparisons used paired bootstrapping across 1000 iterations to compute p-values. Hyperparameters and thresholds: Hyperparameters were selected via five-fold cross-validation and grid search, optimizing AUPRC due to class imbalance. Class probability outputs were thresholded to binary predictions using a grid search on the validation set to balance sensitivity and specificity; the selected thresholds were applied to the held-out test set. To assess potential bias from race/ethnicity encoding, models were also evaluated with the race feature removed.
Key Findings
Cohort characteristics: In complicated UTI cohorts, median age was 64 years (IQR ~44–76); ~73% identified as white. Emergency room presentations were more frequent in the complicated UTI test cohort (27.8% vs 19.6% in training). Fluoroquinolone resistance prevalence in train/test (2007–2013) was similar to US 2012 estimates; SXT resistance prevalence was also similar, while NIT resistance was higher than national estimates. Model performance on complicated UTIs (temporal test 2014–2016): Second-line antibiotics (CIP, LVX) had higher mean AUROC than first-line (NIT, SXT): mean AUROCs across models were approximately CIP 0.800 (95% CI ~0.784–0.916), LVX 0.804 (0.786–0.810), vs NIT 0.674 (0.656–0.681), SXT 0.686 (0.660–0.707). Across all antibiotics, XGBoost achieved the best AUROC and AUPRC; LR and TabNet (without pretraining) were lowest. Pretrained TabNet (TabNetself) improved over standard TabNet but did not surpass XGBoost (p<0.001 across antibiotics). Representative test metrics (AUROC/AUPRC): - NIT: LR 0.662/0.381; XGBoost 0.686/0.411; TabNet 0.670/0.393; TabNetself 0.676/0.396. - SXT: LR 0.666/0.467; XGBoost 0.701/0.524; TabNet 0.685/0.497; TabNetself 0.693/0.503. - CIP: LR 0.789/0.590; XGBoost 0.811/0.617; TabNet 0.798/0.576; TabNetself 0.800/0.584. - LVX: LR 0.791/0.592; XGBoost 0.814/0.624; TabNet 0.803/0.597; TabNetself 0.808/0.606. Generalizability to uncomplicated UTIs: Models trained on complicated cohorts generalized to an independent uncomplicated cohort (n=15,608) with AUROCs comparable to prior work trained specifically on uncomplicated UTIs. For all specimens (AUROC/AUPRC, XGBoost vs TabNetself): NIT 0.593/0.186 vs 0.575/0.172; SXT 0.612/0.318 vs 0.603/0.301; CIP 0.676/0.254 vs 0.670/0.249; LVX 0.667/0.244 vs 0.662/0.228. On the Kanjilal-equivalent test subset (n≈3,941), AUROCs remained in a similar range (e.g., NIT ~0.56, SXT ~0.59, CIP ~0.64–0.65, LVX ~0.62–0.64). Feature importance and ablation: Prior antibiotic resistance and prior antibiotic exposure were consistently among the most important predictors across models and antibiotics. Leaving out the prior resistance feature set significantly reduced AUPRC across antibiotics (approximate decreases ranging from ~0.02 to ~0.09; p<0.001). Excluding prior antibiotic exposure also reduced AUPRC (approximate decreases ~0.009–0.06; p≤0.001). Prior infecting organisms had smaller and not always statistically significant effects. For fluoroquinolones, resistance to one agent was predictive of resistance to the other. Comorbidities (e.g., paralysis, renal disease), prior long-term care facility stays, and prior surgical procedures were also highly predictive. Race/ethnicity feature analysis: Removing the binary race feature produced test-set results within original 95% CIs, with p-values 0.468 (NIT), 0.148 (SXT), 0.023 (CIP), and <0.001 (LVX), indicating minimal to modest impact depending on antibiotic.
Discussion
The study demonstrates that interpretable ML models can leverage routinely collected EHR data to predict antibiotic resistance in potentially complicated UTIs, supporting earlier, more appropriate antibiotic selection. XGBoost outperformed LR and TabNet without pretraining, suggesting that non-linear relationships and feature interactions are important and effectively captured by tree ensembles on tabular data. TabNet with self-supervised pretraining improved performance to be close to XGBoost, highlighting neural networks’ potential when augmented with representation learning and offering advantages such as transfer/self-supervised learning and finetuning with new data. Models trained on complicated UTIs generalized to uncomplicated cohorts with AUROCs comparable to prior specialist models, likely aided by larger training datasets. Feature analyses corroborate clinical expectations: prior resistance/exposures and certain comorbidities are strong predictors; cross-resistance among fluoroquinolones is evident. These findings address the primary objective by showing that ML-derived resistance probabilities can differentiate susceptible vs non-susceptible options at the patient level, potentially reducing mismatched therapy and informing stewardship. The interpretability of all models (coefficients/importance/attention) supports clinical acceptance and insight into risk factors.
Conclusion
This work introduces and compares interpretable ML models (LR, XGBoost, TabNet with/without self-supervision) to predict antibiotic resistance for complicated UTIs, achieving strongest performance with XGBoost and competitive results with pretrained TabNet. Models generalized to an independent uncomplicated UTI cohort and highlighted clinically coherent risk factors (prior resistance/exposure, comorbidities). These results support the feasibility of ML-driven decision support to lower mismatched treatments and enable tailored empiric choices. Future research should: (1) incorporate richer EHR elements (symptoms, dosing/duration, outpatient purchase/dispensation data, lifestyle factors); (2) refine cohorts (e.g., exclude or explicitly model asymptomatic bacteriuria; distinguish CAUTI vs community/hospital-acquired UTI); (3) explore multilabel models capturing joint resistance patterns; (4) evaluate MIC-based outcomes; (5) develop adaptive thresholding and continual/transfer learning for temporal drift; and (6) advance fairness-aware modeling with improved, nuanced race/ethnicity representation.
Limitations
- Dataset coverage: Missing key EHR elements such as symptoms, antibiotic dose/duration and route, allergies, outpatient purchase/consumption status, and encounters outside MGH/BWH, limiting treatment context and potentially model performance. - Cohort specificity and drift: Models trained on 2007–2016 data from two hospitals; prevalence and practices vary by site/time. Temporal changes in resistance mechanisms may degrade performance without finetuning. - Outcome simplification: CLSI categories collapsed I with R into a single non-susceptible class; MIC values not modeled directly in the main task, potentially losing granularity. - Class imbalance and thresholding: Imbalanced labels required validation-based threshold tuning; optimal thresholds may not transfer across settings with different prevalence. - Potential inclusion of asymptomatic bacteriuria (ASB): Lack of ASB indicators may introduce label noise and reduce specificity of learned associations. - Limited race/ethnicity encoding: Binary white/non-white feature is coarse and may embed bias; although its exclusion minimally affected performance here, broader fairness concerns remain. - Lack of empiric prescription data: The pipeline did not recommend specific treatments for complicated UTIs due to absent clinician prescribing data; evaluation focused on resistance prediction rather than end-to-end prescribing impact.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny