
Medicine and Health
Artificial intelligence guided screening for cardiomyopathies in an obstetric population: a pragmatic randomized clinical trial
D. A. Adedinsewo, A. C. Morales-lara, et al.
This groundbreaking clinical trial, conducted by a team of expert authors, reveals how AI-guided screening can significantly enhance the diagnosis of left ventricular systolic dysfunction (LVSD) in pregnant and postpartum women. Utilizing advanced technology, including digital stethoscopes and electrocardiograms, the study demonstrates promising results over usual care practices.
~3 min • Beginner • English
Introduction
Cardiomyopathy is a leading cause of maternal mortality in the United States and the top cause of death in the postpartum period, with an estimated incidence of 1 in 2,000 overall and up to 1 in 700 among African American women. Nigeria has the highest reported incidence globally (~1 in 96 deliveries). Diagnosis during pregnancy/postpartum is challenging due to overlapping physiologic symptoms, leading to delays and adverse outcomes. AI-enabled ECG technologies have demonstrated strong performance for detecting low left ventricular ejection fraction (LVEF) in prior retrospective and pilot prospective studies, including in perinatal populations. The study asked whether AI-guided screening using a digital stethoscope and 12-lead ECG improves detection of pregnancy-related LV systolic dysfunction (LVSD) beyond usual care in an obstetric population in Nigeria.
Literature Review
Prior work has shown AI-enabled ECGs can identify cardiovascular pathologies and detect low LVEF. Retrospective studies reported AUCs around 0.89 for perinatal LVSD detection, and a pilot prospective study in the U.S. obstetric population demonstrated high accuracy (AUC 1.00 with 12-lead ECG and 0.98 with a digital stethoscope) for identifying LVSD with LVEF <45%. Additional retrospective validations in the U.S. and Korea support AI-ECG performance for perinatal LVSD. Separate studies have developed and validated AI ECG models for detecting hypertrophic cardiomyopathy and valvular disease. Collectively, these data suggest AI-based ECG analysis could enhance screening and early detection of LVSD during the peripartum period, but prior to this trial it was unknown if such screening improves detection beyond standard care in obstetric settings.
Methodology
Design: SPEC-AI Nigeria was an investigator-initiated, pragmatic, multicenter, open-label, randomized clinical trial comparing AI-guided screening versus usual care among obstetric patients in Nigeria. The trial followed CONSORT-AI guidelines, was IRB-approved, and registered (NCT05438576). Funding was provided by Mayo Clinic and NIH-supported programs. Settings and participants: Women aged 18–49 who were pregnant or within 12 months postpartum and receiving obstetric care at six Nigerian tertiary hospitals (Aminu Kano Teaching Hospital, Lagos University Teaching Hospital, Olabisi Onabanjo University Teaching Hospital, Rasheed Shekoni Specialist Hospital, University College Hospital Ibadan, and University of Ilorin Teaching Hospital) were enrolled. Exclusions included complex congenital heart disease, notable conduction abnormalities (e.g., complete heart block or pacemaker), and inability to consent. Randomization: Participants were randomized 1:1 via web-based dynamic minimization stratified by site to intervention or control. Interventions: All participants received a standard 12-lead ECG at enrollment; AI predictions for age and sex were provided asynchronously as attention control. Intervention arm additionally received: (a) digital stethoscope (Eko DUO) recordings of 15-second ECG and phonocardiogram at V2, angled, and a handheld ECG, with real-time point-of-care binary AI predictions for LVSD (positive/negative), and (b) AI-enabled 12-lead ECG predictions for LVSD provided asynchronously (usually within 1 week), plus a confirmatory echocardiogram at baseline for AI validation. ECG acquisition used GE Marquette 2000 machines; raw XML files were uploaded for AI analysis. For primary endpoint adjudication, the maximum positive digital stethoscope prediction across recording sites was used, mirroring clinical auscultation practice. AI algorithms: Convolutional neural networks trained on >100,000 adults were used. The original Mayo 12-lead model was developed for LVEF ≤35%, later retrained for LVEF <40% (US FDA-cleared version with ECG quality checks, algorithm lvef v2.2.0). The stethoscope model was adapted to single-lead ECG and phonocardiogram for LVEF <40% (ELEFT 7.2.0) with built-in quality checks. Poor-quality recordings without reliable predictions were considered negative. Variables and imaging: Demographics and clinical data were collected in REDCap. Echocardiograms were performed and interpreted locally by cardiologists; studies with low EF or a sample of normals were uploaded for coordinating-center review with repeat imaging when discordant or inadequate. Outcomes: Primary outcome was identification of cardiomyopathy defined as LVEF <50% by echocardiography. Control arm: count of clinically recognized and documented LVSD on echo per usual care. Intervention arm: count of LVSD with a positive AI screen (digital stethoscope maximum prediction and/or 12-lead AI result for LVSD at the encounter), confirmed by echocardiography at time of ECG acquisition. If AI screen was negative/not computed/insufficient quality at the first encounter but echo showed LVEF <50%, the case did not count toward the primary endpoint. Secondary outcomes included subgroup performance (age, ethnicity, region, hypertensive disorder of pregnancy, pregnancy/postpartum status) and AI effectiveness at alternative EF thresholds (<45%, <40%, ≤35%) in the intervention arm at baseline, plus exploratory composite adverse outcomes, composite cardiovascular outcomes, and all-cause mortality. Follow-up: Participants could enroll at any perinatal stage up to 12 months postpartum, with up to seven possible visits (three during pregnancy and four postpartum time windows). Statistical analysis: Power assumed 4% LVSD detection in intervention vs 1% control; initial target 848 (424/group), increased to 1,200 (200 per site). Modified intention-to-treat (mITT) excluded those who did not complete baseline testing, died before baseline, or withdrew; poor-quality AI recordings were treated as negative. Odds ratios and 95% CIs were estimated by logistic regression; Pearson chi-squared tests assessed significance (two-sided α=0.05). Unplanned analyses added during peer review included a full ITT (all excluded assumed normal LVEF) and site-adjusted logistic regression. Diagnostic performance metrics (AUC, sensitivity, specificity, predictive values) were reported within the intervention arm at baseline using STARD criteria.
Key Findings
Enrollment and analysis: 1,232 randomized (616/arm); 1,195 completed baseline and were included in mITT (587 intervention, 608 control) with follow-up through 15 May 2024. Primary outcome (LVSD: LVEF <50%): • Digital stethoscope AI (max prediction across recording sites): 24/587 (4.1%) vs 12/608 (2.0%); OR 2.12 (95% CI 1.05–4.27); P=0.032; NNS=47. Site-adjusted OR 2.25 (95% CI 1.09–4.66); P=0.029. Full ITT (assuming excluded normal): unadjusted OR 2.04 (95% CI 1.01–4.12); P=0.042; site-adjusted OR 2.13 (95% CI 1.03–4.41); P=0.041. • 12-lead AI-ECG (US FDA-cleared model): 20/587 (3.4%) vs 12/608 (2.0%); OR 1.75 (95% CI 0.85–3.62); P=0.125 (not statistically significant). • 12-lead AI-ECG (original Mayo model): 18/587 (3.1%) vs 12/608 (2.0%); OR 1.57 (95% CI 0.75–3.29); P=0.227 (not significant). Subgroups: Directionally consistent benefit for digital stethoscope across prespecified subgroups. For US FDA-cleared 12-lead AI-ECG, stronger effect in age ≥30 years (OR 4.2; 95% CI 1.2–15.0) vs <30 years (OR 0.9; 95% CI 0.4–2.5). Diagnostic performance at baseline (intervention arm): • Digital stethoscope (max prediction) for LVEF <50%: AUC 0.976 (95% CI 0.953–0.998); sensitivity 95.7% (22/23); specificity 82.0%; PPV 18.0%; NPV 99.8%. For LVEF <40%: AUC 0.985 (95% CI 0.974–0.996). • US FDA-cleared 12-lead AI-ECG: AUC 0.928 (95% CI 0.875–0.981) for LVEF <50%; AUC 0.928 (95% CI 0.865–0.990) for LVEF <40%. • Original Mayo 12-lead AI-ECG: AUC 0.892 (95% CI 0.825–0.960) for LVEF <50%; AUC 0.921 (95% CI 0.864–0.979) for LVEF <40%. Exploratory clinical outcomes: • Composite adverse events: 100/587 vs 104/608; OR 1.00 (95% CI 0.74–1.35); P=0.975. • Composite cardiovascular events: 56/587 vs 53/608; OR 1.10 (95% CI 0.74–1.64); P=0.621. • All-cause mortality: 12/587 (2.0%) vs 3/608 (0.5%); HR 4.20 (95% CI 1.18–14.87); P=0.026. • Cardiovascular mortality: 5/587 vs 3/608; HR 1.75 (95% CI 0.42–7.33); P=0.442 (not significant). Safety: No serious adverse events related to participation; five instances of minor skin irritation from ECG electrodes (1 intervention, 4 control).
Discussion
This pragmatic randomized trial in Nigeria demonstrates that AI-guided screening using a digital stethoscope significantly increases detection of pregnancy-related LV systolic dysfunction versus usual care, addressing the challenge of delayed diagnosis due to symptom overlap with normal pregnancy. The relatively high LVSD prevalence observed supports the need for screening in this population. The digital stethoscope’s point-of-care, real-time AI predictions, robustness in low-resource settings, and strong diagnostic performance metrics make it attractive for scalable implementation, potentially improving risk stratification and facilitating timely cardiology referral in settings with limited specialist availability. Although 12-lead AI-ECG models showed numerically higher detections, the difference did not reach statistical significance, and subgroup analyses suggest age-related performance differences for the ECG model. The number needed to screen (47) compares favorably with other obstetric screening interventions, highlighting potential utility. Observed higher all-cause mortality in the intervention arm warrants further investigation; hypotheses include differential ascertainment due to increased contact and limitations of mortality reporting systems. Overall, AI-guided screening could reduce diagnostic delays, improve cardio-obstetric care, and enable large population studies on peripartum cardiac dysfunction.
Conclusion
Among pregnant and postpartum women in Nigeria, AI-guided screening with a digital stethoscope doubled the detection of LV systolic dysfunction compared to usual care, with strong diagnostic performance and feasibility in low-resource obstetric settings. While 12-lead AI-ECG showed a consistent directional effect, it did not achieve statistical significance for the primary outcome. These results support integrating AI-enabled tools to enhance early detection of peripartum cardiomyopathy and help close disparities in cardiovascular care. Future research should assess impacts on clinical management, costs, healthcare utilization, maternal and infant outcomes, explore model threshold optimization for specific populations, and address implementation strategies, including workflow integration and equitable access to confirmatory echocardiography.
Limitations
Key limitations include the pragmatic, open-label design and enrollment at tertiary centers with cardiology/echocardiography capabilities, potentially limiting generalizability. Most participants entered during late pregnancy or postpartum, reducing follow-up opportunities; only 61% completed a second visit. Attrition and variable visit schedules led to similar baseline and end-of-study detection counts. Echocardiograms in the control arm were performed at the physician’s discretion, so true LVSD prevalence in control is unknown; out-of-pocket costs and socioeconomic factors were not assessed and may have influenced echo uptake. Mortality ascertainment relied on medical records and family contact without national registries, introducing potential reporting bias. The primary outcome definition (LVEF <50%) did not perfectly align with original model training thresholds (≤35% or <40%), affecting sensitivity/specificity trade-offs. Site enrollment at teaching hospitals and the use of devices with built-in quality checks (treating poor-quality recordings as negative) may have influenced detection rates. The study did not tailor interventions to modify clinical decisions, limiting inference about downstream clinical outcomes.
Related Publications
Explore these studies to deepen your understanding of the subject.