logo
ResearchBunny Logo
Blinded, randomized trial of sonographer versus AI cardiac function assessment

Medicine and Health

Blinded, randomized trial of sonographer versus AI cardiac function assessment

B. He, A. C. Kwan, et al.

Discover the groundbreaking findings of a randomized clinical trial comparing AI with sonographer assessments of left ventricular ejection fraction (LVEF) in echocardiography. Conducted by esteemed researchers including Bryan He and Susan Cheng, this study reveals that AI not only meets the accuracy of sonographers but also shows superiority in mean absolute difference.

00:00
00:00
~3 min • Beginner • English
Introduction
Accurate quantification of cardiac function is essential for diagnosis, risk stratification, and treatment monitoring. Left ventricular ejection fraction (LVEF) is a key metric guiding eligibility for therapies and interventions, yet conventional echocardiographic LVEF assessment is subject to heterogeneity and inter-/intraobserver variability due to reliance on manual, subjective ratings. Guidelines recommend repeated measurements over multiple cardiac cycles to mitigate variability, but this is rarely feasible in clinical practice, leading to frequent reliance on visual estimation, which is suboptimal for detecting subtle changes critical to therapeutic decisions (for example, chemotherapy continuation or defibrillator implantation). With advances in AI, numerous algorithms have been developed to automate LVEF assessment, showing improved precision in retrospective datasets. However, prior to this study, no cardiovascular AI technologies had been validated in blinded, randomized clinical trials, and the impact of AI-human interaction on clinical interpretation remained underexplored. This study aimed to evaluate, in a blinded, randomized non-inferiority clinical trial, whether initial AI assessment of LVEF influences final cardiologist interpretation compared with conventional sonographer initial assessment.
Literature Review
The authors note longstanding challenges with variability in manual LVEF assessment and guideline recommendations for repeated measures that are often impractical, leading to widespread use of visual estimates with limitations for detecting clinically meaningful changes. Prior AI efforts in echocardiography have shown improved precision in retrospective, unidirectional datasets, but lacked validation in blinded, randomized clinical trials. The clinical effects of AI prompting on interpretation workflows had not been adequately studied, motivating this prospective randomized evaluation.
Methodology
Design: Blinded, randomized non-inferiority clinical trial comparing initial AI-guided versus sonographer-guided LVEF assessments within the echocardiography interpretation workflow. ClinicalTrials.gov ID: NCT01406412. No outside funding. Population and setting: 3,769 transthoracic echocardiogram studies performed at an academic medical center (1 June 2019–8 August 2019) were prospectively re-evaluated. After excluding 274 studies for insufficient image quality to contour the left ventricle, 3,495 studies from 3,035 patients were included. Twenty-five cardiac sonographers (mean 14.1 years practice) and ten cardiologists (mean 12.7 years practice) participated. Randomization and blinding: Eligible studies were randomized 1:1 to initial evaluation by AI (n=1,740) or by sonographer (n=1,755). Cardiologists, at a separate workstation and time, reviewed the full echocardiogram with initial annotations for final blinded assessment of LVEF. Sonographers could not reliably identify whether initial assessments were by AI or sonographer (blinding index 0.088), indicating effective blinding. Interventions and assessments: In the sonographer arm, sonographers annotated LVEF (Simpson’s method). In the AI arm, initial LVEF was provided by an AI algorithm with corresponding annotations. Cardiologists adjudicated the initial assessments to produce final LVEF reports. Workflow timing metrics for sonographers and cardiologists were recorded. Endpoints: - Primary efficacy endpoint: Proportion of studies with a substantial change (>5% absolute change) between initial (AI or sonographer) and final cardiologist LVEF assessments; and the mean absolute difference (MAD) between initial and final LVEF. - Key secondary safety endpoint: Substantial difference and MAD between final cardiologist LVEF and the previously clinically reported LVEF. - Other outcomes: Any change between initial and final assessments, adjudication time for cardiologists, and subgroup analyses across LVEF method (single-plane versus biplane), race/ethnicity, sex, image quality, and care setting (inpatient versus outpatient). Post hoc analysis assessed threshold crossing at 35% LVEF. Statistical analysis: Non-inferiority and superiority tests were performed for the primary outcome; all other tests assessed superiority. Fisher’s exact test was used for categorical outcomes and two-sided Student’s t-test for quantitative outcomes. Results are reported with 95% confidence intervals.
Key Findings
- Primary outcome: Substantial change (>5% absolute change) between initial and final LVEF occurred in 16.8% (292/1,740) of AI-guided studies vs 27.2% (478/1,755) of sonographer-guided studies; difference -10.4% (95% CI -13.2% to -7.7%); P<0.001 for non-inferiority and P<0.001 for superiority. The MAD between initial and final LVEF was 2.79% (AI) vs 3.77% (sonographer); difference -0.97% (95% CI -1.33% to -0.54%); P<0.001. - Key secondary (final vs previous clinical report): Substantial change in 50.1% (871/1,740) of AI vs 54.5% (957/1,755) of sonographer; difference -4.5% (95% CI -7.8% to -1.2%); P=0.008. MAD 6.29% (AI) vs 7.23% (sonographer); difference -0.94% (95% CI -1.34% to -0.54%); P<0.001. - Other outcomes: Any change between initial and final assessments occurred in 63.2% (AI) vs 69.4% (sonographer); difference -6.2% (95% CI -9.3% to -3.1%); P<0.001. Cardiologist adjudication time median 54 s (IQR 31–95) for AI vs 64 s (IQR 36–105) for sonographer; mean difference -8 s (95% CI -12 to -4); P<0.001. In the AI arm, sonographer time for initial assessment was effectively 0 s; in the sonographer arm, median sonographer time was 119 s (IQR 77–173). Post hoc, 1.3% (22/1,740) of AI-arm studies crossed the 35% LVEF threshold. - Subgroup analyses: The reduction in the primary endpoint with AI was consistent across major subgroups including method (single-plane and biplane), race/ethnicity, sex, image quality categories, and care location, with differences in MAD generally favoring AI.
Discussion
This randomized, blinded trial demonstrates that AI-guided initial assessment of LVEF is non-inferior and statistically superior to sonographer-guided initial assessment in terms of reducing substantial changes required by cardiologists for final reports. AI guidance led to fewer and smaller revisions, shorter cardiologist adjudication times, and closer agreement with prior cardiologist-reported LVEF, indicating improved consistency. Blinding was effective, supporting unbiased comparisons. These findings address the central question of whether integrating AI at the initial assessment stage can enhance the echocardiography interpretation workflow, suggesting that AI can streamline clinical practice while maintaining or improving accuracy. The study also contributes contemporary evidence on variability in LVEF assessment and how AI may mitigate it in routine workflows.
Conclusion
Initial AI assessment of LVEF within a blinded, randomized workflow was non-inferior and often superior to sonographer assessment, reducing substantial changes by cardiologists, decreasing adjudication time, and improving agreement with prior reports. These results support incorporating AI into echocardiography workflows for cardiac function assessment. Future work should evaluate generalizability across institutions and equipment, measure downstream clinical outcomes and decision-making, and extend AI-guided workflows to additional echocardiographic measurements and patient subgroups.
Limitations
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny