logo
ResearchBunny Logo
Predictive modeling of proliferative vitreoretinopathy using automated machine learning by ophthalmologists without coding experience

Medicine and Health

Predictive modeling of proliferative vitreoretinopathy using automated machine learning by ophthalmologists without coding experience

F. Antaki, G. Kahwati, et al.

This study demonstrates that ophthalmologists, even without coding experience, can design machine learning algorithms to predict proliferative vitreoretinopathy (PVR) using automated ML techniques. Conducted by experts including Fares Antaki, Ghofril Kahwati, and Julia Sebag, the research revealed promising results with an AUC of 0.90 for PVR prediction. Explore how non-coding professionals can tap into the power of machine learning in ophthalmology!... show more
Introduction

Proliferative vitreoretinopathy (PVR) is a major cause of surgical failure after rhegmatogenous retinal detachment (RRD) repair, occurring in about 5–10% of cases and accounting for most primary failures. Although multiple clinical and biological risk factors have been identified (e.g., trauma, aphakia, vitreous hemorrhage, pre-existing PVR), existing predictive formulas based on clinical and genetic variables have had insufficient performance for routine use. Artificial intelligence has been widely applied to ophthalmic imaging, but fewer studies have used electronic health record (EHR) clinical data, which pose challenges such as low disease incidence and class imbalance. The study’s objective was to test the feasibility and discriminative performance of AutoML-designed ML models, built by ophthalmologists without coding experience, to predict postoperative PVR using preoperative clinical data from EHRs.

Literature Review

Prior work developed clinical and genetic formula-based models for PVR risk (e.g., Asaria et al.; Kon et al.), but reported performance and validation were limited and not clearly aligned with current standardized metrics. External validation efforts (e.g., Sala-Puigdollers et al.) highlighted generalizability issues. Some studies integrated biomarker data (e.g., vitreous/subretinal fluid proteins) improving AUCs, with pre-existing PVR often being a strong predictor. In ophthalmology, AI has been more frequently applied to imaging than to EHR-derived clinical data, where class imbalance is a common obstacle. Recent democratization of AI via AutoML has enabled non-programmers (e.g., Faes et al. for deep learning) to build models, motivating investigation of AutoML for PVR prediction from clinical variables.

Methodology

Design: Retrospective cohort at a tertiary teaching hospital (CHUM, Montreal, Canada); IRB-approved with waiver of consent. Cohort: Consecutive eyes undergoing pars plana vitrectomy for RRD by a single surgeon from 2012–2019. Inclusion: Simple RRD and those with giant retinal tears, vitreous hemorrhage, and/or pre-existing PVR grade C or worse. Included prior ocular surgeries (e.g., failed pneumatic retinopexy, scleral buckle, vitrectomy for non-RRD, glaucoma, cataract). Exclusions: Age <18, tractional detachments, recent penetrating trauma (<6 months), prior RRD vitrectomy in same eye, <3 months follow-up, second eye of same patient when both eligible, and cases where pre-existing vs postoperative PVR could not be reliably distinguished.

Outcome definition and labeling: Outcome was development of postoperative PVR within at least 3 months follow-up. Postoperative PVR defined per Kon et al.: new PVR Grade C >1 clock hour in detached retina post-vitrectomy, or new clinically visible membranes/bands >1 clock hour in an attached retina, or visually significant macular pucker requiring re-intervention. Cases labeled as “PVR” or “No PVR” by an ophthalmologist-in-training and a vitreoretinal surgeon based on EHR postoperative data and operative reports.

Features and data collection: Fifteen pre- and peri-operative clinical variables were extracted from EHRs (Table 1 variables). Categorical variables were assessed at the preoperative visit except postoperative lens status (day 1). Pre-existing PVR was defined as PVR Grade C >1 clock hour. Intraoperative variables (e.g., endotamponade type, endolaser, cryotherapy) were excluded to limit confounding. The 15 variables included: age, sex, previous ocular surgery, duration of symptoms, subtotal/total RRD (≥3 quadrants), macular status, pre-existing PVR, vitreous hemorrhage, number of retinal breaks, giant retinal tear, macular hole, anterior uveitis, quadrants of lattice degeneration (0–4), intraocular pressure (Goldmann), postoperative lens status (phakic vs pseudophakic/aphakic).

Missing data handling: Continuous variables with missingness: duration of symptoms (15.2%), intraocular pressure (17.8%). Little’s MCAR test indicated MCAR (p=0.933). Median imputation was used for missing continuous values.

Class imbalance handling: Original cohort: 506 eyes (460 No PVR, 46 PVR; prevalence 9.1%). To address imbalance before physician-led model building, random undersampling (RUS) of the majority class was performed in WEKA, targeting a 2:1 control:case ratio, resulting in 92 controls and 46 PVR cases (prevalence 33.3%). Clinical feature distributions remained comparable between original and RUS datasets for significantly different variables.

Feature selection: Univariate analyses on the original 506-eye cohort identified features significantly associated with postoperative PVR. Selected features for modeling (Feature Set 1, 8 variables): age, duration of symptoms, intraocular pressure, subtotal/total RRD, macular status, giant retinal tear, vitreous hemorrhage, pre-existing PVR. Uveitis, although significant, was excluded due to association with intraocular pressure to reduce multicollinearity. Feature Set 2 included the same variables excluding pre-existing PVR (7 variables).

Model development (AutoML by ophthalmologists): Two ophthalmologists without coding experience used MATLAB’s Classification Learner App to train ML classifiers. They consulted app documentation, trained all available classifiers, compared performance, and selected SVM and Naïve Bayes as best-performing families. AutoML-based hyperparameter optimization was used to tune models. Fivefold cross-validation was used for internal validation; combined confusion matrices were generated.

Benchmarking: Independently, a data scientist implemented manually coded versions of comparable SVM and NB models in MATLAB for redundancy/benchmarking. Due to lack of variability measures from the GUI, formal statistical comparison was not performed; instead, F1 scores were compared for order-of-magnitude agreement. Code available at https://github.com/ghofok/PVR_Prediction.

Statistical analysis and metrics: Nonparametric tests for continuous variables (Mann-Whitney U) due to non-normality; Chi-square/Fisher’s exact tests for categorical variables; significance at p<0.05. Performance metrics included AUC, sensitivity, specificity, PPV, NPV, and F1 score from confusion matrices. PPV/NPV were also adjusted to the real prevalence (9.1%) using Bayes’ theorem.

Key Findings

Cohort: 506 eyes analyzed; 46 (9.1%) developed postoperative PVR after median 52 months follow-up (range 3–105). Among PVR cases: macular pucker requiring re-intervention in 30.4%, Grade C posterior PVR in 39.1%, Grade C anterior PVR in 30.4%; primary surgical failure in 60.9%; final anatomic success in 71.7%.

Risk factors (PVR vs No PVR): PVR cases were older (mean 68.8 vs 59.9 years, p<0.001), had longer symptom duration (median 21 vs 7 days, p<0.001), more extensive detachments (subtotal/total RRD 47.8% vs 12.8%, p<0.001; macula-off 82.6% vs 58.0%, p=0.001), higher rates of giant retinal tears (8.7% vs 2.0%, p=0.023), vitreous hemorrhage (21.7% vs 6.7%, p=0.002), pre-existing PVR (37.0% vs 0.2%, p<0.001), lower intraocular pressure (median 12 vs 15 mmHg, p=0.002), and more uveitis (6.5% vs 0.4%, p=0.006). Number of tears, macular hole, lattice degeneration quadrants, sex, previous surgery, and postoperative lens status were not significantly different.

Model performance (RUS dataset, fivefold CV):

  • Feature Set 1 (8 features including pre-existing PVR): • Model 1: Quadratic SVM—TP 29, FP 2, TN 90, FN 17; AUC 0.90; F1 0.75; Sensitivity 63.0%; Specificity 97.8%; PPV 93.5% (adjusted 74.4%); NPV 84.1% (adjusted 96.4%). • Model 2: Optimized Naïve Bayes—TP 32, FP 4, TN 88, FN 14; AUC 0.86; F1 0.78; Sensitivity 69.6%; Specificity 95.7%; PPV 88.9% (adjusted 61.6%); NPV 86.3% (adjusted 96.9%).
  • Feature Set 2 (7 features, excluding pre-existing PVR): • Model 3: Optimized SVM—TP 21, FP 5, TN 87, FN 25; AUC 0.81; F1 0.58; Sensitivity 45.7%; Specificity 94.6%; PPV 80.8% (adjusted 45.7%); NPV 77.7% (adjusted 94.6%). • Model 4: Optimized Naïve Bayes—TP 25, FP 7, TN 85, FN 21; AUC 0.81; F1 0.64; Sensitivity 54.3%; Specificity 92.4%; PPV 78.1% (adjusted 41.7%); NPV 80.2% (adjusted 95.3%).

Including pre-existing PVR as a feature improved AUC and overall discriminative performance. Benchmarking F1 scores for manually coded counterparts were of similar magnitude (Model 1: 0.76; Model 2: 0.81; Model 3: 0.60; Model 4: 0.69).

Discussion

The study demonstrates that ophthalmologists without coding experience can feasibly build ML classifiers using AutoML tools to predict postoperative PVR from EHR-derived clinical variables. The best-performing model (quadratic SVM with all selected features, including pre-existing PVR) achieved high specificity (97.8%) and acceptable sensitivity (63.0%), making it useful for ruling in PVR risk when positive and aiding clinical decision-making for higher-risk patients. Excluding pre-existing PVR reduced discriminative performance (AUC 0.81), underscoring its importance as a predictor. The work addresses the challenge of class imbalance—common with low-incidence complications—by applying data-level solutions (random undersampling) prior to training; such preparatory steps benefited from data science expertise, suggesting ongoing collaboration is valuable even with AutoML. The findings align with and in some cases compare favorably to prior PVR prediction efforts based on clinical and genetic variables, while using standardized performance metrics and internal cross-validation. Clinically, a highly specific model can help identify patients for closer follow-up, targeted counseling, or inclusion in trials evaluating prophylactic therapies for PVR. However, negative predictions should be interpreted cautiously given moderate sensitivity. External validation is still needed to assess generalizability before clinical deployment.

Conclusion

AutoML-enabled development of PVR prediction models by ophthalmologists without coding experience is feasible. The top-performing quadratic SVM using key clinical features (including pre-existing PVR) achieved AUC 0.90 with high specificity. These models could help identify high-risk patients for counseling, follow-up planning, and research on prophylactic interventions. Future work should include external validation on independent cohorts, assessment of model-guided follow-up strategies (including resource implications of false positives), exploration of retinal redetachment as a separate endpoint, and continued refinement of imbalance handling strategies. As AutoML matures, reliance on manually coded benchmarking may diminish, but collaboration with data scientists will likely remain important for data preparation and validation.

Limitations
  • No external validation; generalizability to other centers and surgeons is untested.
  • AutoML interface did not provide variability measures (e.g., confidence intervals), limiting statistical comparisons among models and with manually coded benchmarks.
  • Class imbalance required undersampling, which may affect model calibration and information retention from the majority class.
  • Single-surgeon, single-center retrospective design may limit external validity and introduce center-specific practice patterns.
  • Moderate sensitivity means some PVR cases would be missed; thresholds and optimization targets may have differed between automated and manual implementations.
  • Potential residual confounding; intraoperative variables were excluded to avoid confounding but may carry predictive information.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny