logo
ResearchBunny Logo
Prediction of ciprofloxacin resistance in hospitalized patients using machine learning

Medicine and Health

Prediction of ciprofloxacin resistance in hospitalized patients using machine learning

I. Mintz, M. Chowers, et al.

Discover how machine learning models developed by Igor Mintz, Michal Chowers, and Uri Obolski are transforming the prediction of ciprofloxacin resistance in hospitalized patients. This innovative research showcases high predictive capabilities, offering significant benefits for antibiotic administration decisions.

00:00
00:00
~3 min • Beginner • English
Introduction
Antimicrobial resistance (AMR) is a critical public health challenge that compromises empiric antibiotic therapy, particularly in hospitalized settings with high resistance levels and frequent bug–drug mismatches. Ciprofloxacin, a widely used fluoroquinolone for diverse infections, has seen rising resistance due to sustained high consumption, though sensitivity can recover when use declines. With the growth of electronic medical records and ML methods, predictive models can support clinicians in selecting effective empiric therapy. Prior ciprofloxacin resistance prediction efforts have been limited in scope (specific units, infections, or patient subgroups). This study aims to develop and evaluate ensemble ML models that predict ciprofloxacin resistance for hospitalized patients, in scenarios where the infecting species is either unknown (bacteria-agnostic) or known (bacteria-gnostic).
Literature Review
The paper situates its work within growing ML applications for AMR prediction, noting that previous ciprofloxacin resistance models were often constrained: community-acquired UTIs only, ICU-only populations, specific infection sites, single species, or single sample sources. Reported performances vary, e.g., AUC ~0.726 in an internal medicine department conditioned on Gram stain and ~0.83 for community UTI settings. The study emphasizes incorporating local resistance trends (akin to antibiograms) and prior patient-specific resistance histories, aligning with literature on risk factors such as prior resistance, facility origin (e.g., nursing homes), and cross-resistance patterns.
Methodology
Setting and data: EMRs from Meir Medical Center (Israel), serving ~600,000 residents, were used. Included were hospitalized patients with positive bacterial cultures tested for ciprofloxacin susceptibility between 2016–2019. Target organisms: Escherichia coli, Klebsiella pneumoniae, Morganella morganii, Pseudomonas aeruginosa, Proteus mirabilis, and Staphylococcus aureus. Susceptibility testing: For gram-negative bacteria in urine or wound cultures, VITEK 2 (bioMerieux) was used; for all blood isolates or gram-positive bacteria (urine, wounds, blood), disk diffusion with CLSI breakpoints was used. Intermediate results were considered resistant. Feature engineering: Variables from EMRs included demographics, functional status, prior antibiotic use, prior hospitalizations (within previous year), pathogen identity, and susceptibility. Engineered features summarized previous infections with resistant bacteria, prior antibiotic usage, and prior hospitalizations. Final dataset comprised 10,053 susceptibility tests from 5,540 patients and 73 variables. Two datasets were created: bacteria-gnostic (full feature set) and bacteria-agnostic (excluding 20 bacteria-related features). Train-test split: Time-based split to emulate real-world deployment and reduce data leakage: 75% training, 25% test, partitioned by culture date, yielding mutually exclusive sets and representing a form of external validation. Modeling: Four base learners were trained—LASSO-penalized logistic regression, random forest, gradient-boosted trees (XGBoost), and a neural network. Hyperparameters were tuned via 200 random searches with five-fold time series cross-validation. Stacking: A logistic regression super learner was trained on base learner predictions to produce a single ensemble output: the predicted probability of ciprofloxacin resistance. Evaluation: Performance assessed by ROC-AUC with 95% CIs computed from 5,000 bootstrap samples of the test set. Calibration was examined, and model-agnostic interpretability employed Kernel SHAP using 300 background samples from the training data, computing SHAP values across the entire test set. Decision curve analysis: Conducted on test-set predictions to estimate standardized net benefit (sNB) across thresholds (cost–benefit ratios) compared with two strategies: treating all as resistant or all as susceptible. Implementation: Analyses in Python 3.7 using NumPy, pandas, scikit-learn, XGBoost, TensorFlow, Matplotlib, and SHAP. Ethics: Approved by Meir Medical Center IRB; informed consent waived due to retrospective use of archived records.
Key Findings
- The ensemble model outperformed individual base learners and was well calibrated in both datasets. Bacteria-agnostic test-set ROC-AUC: 0.737 (95% CI 0.715–0.758); bacteria-gnostic: 0.837 (95% CI 0.821–0.854). - Base learner AUCs: agnostic—neural network 0.716, LASSO 0.736, random forest 0.719, XGBoost 0.729; gnostic—neural network 0.82, LASSO 0.835, random forest 0.812, XGBoost 0.832. - SHAP analysis (agnostic): top influences included prior ciprofloxacin resistance within 60 days, arrival from an institution, recent resistance to any antibiotic in similar units, prior ciprofloxacin resistance 61–180 days, and recent hospital-wide resistance to any antibiotic. - SHAP analysis (gnostic): top influences included average hospital-wide resistance of the same species to any antibiotic in the past 30 days, number of previous fluoroquinolone-resistant infections in past 60 days, whether the species was Pseudomonas aeruginosa, number of non-ciprofloxacin antibiotics to which the same species was resistant in past 60 days in the same patient, and resistance of other species to fluoroquinolones. - Decision curve analysis: Using the ensemble predictions provided greater or at least equal standardized net benefit compared with treating all infections as resistant or all as susceptible across a broad range of threshold probabilities (cost–benefit ratios).
Discussion
The study addresses the clinical challenge of selecting effective empiric ciprofloxacin therapy amid high and variable resistance. By leveraging EMRs and recent local resistance trends, the ensemble models deliver well-calibrated probabilities of resistance that can inform antibiotic choices when species identity is unknown or known. Stacked ensembling improved discrimination by up to 0.025 over base learners. Decision curve analysis indicates practical utility across diverse clinical preferences and risk tolerances. SHAP-based interpretation corroborates known risk factors—prior resistance history, facility origin, and local resistance frequencies—supporting model credibility and potential clinician acceptance. Comparisons with prior work suggest robust performance despite a heterogeneous dataset spanning multiple species, sources, and hospital departments.
Conclusion
Ensemble ML models predicting ciprofloxacin resistance in hospitalized patients achieved strong discrimination and calibration and demonstrated net clinical benefit across a range of decision thresholds. Incorporating recent, local resistance patterns alongside patient history enhanced performance and interpretability. These results advance integration of ML decision support into clinical practice. Future improvements may come from algorithmic refinements, tailored feature engineering, and inclusion of richer EMR and community-level data to further enhance accuracy and generalizability while reducing antibiotic misuse.
Limitations
- Missing community-level covariates such as community antibiotic consumption and exposure in patients’ surroundings (neighborhoods, households), which could improve predictions if added. - Limited generalizability across different hospitals, regions, or time periods due to varying demographics, antibiotic use, and evolving AMR patterns; site-specific retraining may be required. - Under-representation of younger patients in the dataset may affect performance in that subgroup. - SHAP explanations, while useful for transparency, have known limitations and are not causal estimates. - Some species (e.g., K. pneumoniae, M. morganii) had higher resistance proportions in the test set, potentially challenging generalization, though models still performed well.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny