logo
ResearchBunny Logo
Diagnostic accuracy of point-of-care ultrasound with artificial intelligence-assisted assessment of left ventricular ejection fraction

Medicine and Health

Diagnostic accuracy of point-of-care ultrasound with artificial intelligence-assisted assessment of left ventricular ejection fraction

P. Motazedian, J. A. Marbach, et al.

This exciting study by Pouya Motazedian and team unveils the remarkable accuracy of AI-assisted point-of-care ultrasound (FoCUS) in evaluating left ventricular ejection fraction (LVEF). With outstanding correlation to transthoracic echocardiography, this research highlights the potential of FoCUS in clinical settings, making it a game-changer in cardiac assessment.

00:00
00:00
~3 min • Beginner • English
Introduction
Cardiovascular disease is the leading cause of mortality worldwide and its rising prevalence has increased healthcare utilization and costs. Transthoracic echocardiography (TTE) is the most frequently used cardiovascular test and represents a substantial portion of imaging expenditures. Advances in ultrasound technology have enabled pocket-sized devices used outside echocardiography laboratories, leading to growing adoption of point-of-care ultrasound (PoCUS), which can outperform the physical examination and improve diagnostic accuracy. Assessing left ventricular ejection fraction (LVEF) is a key element of focused cardiac ultrasound (FoCUS). Although TTE is the standard for LVEF assessment, access can be limited, particularly for bedside decision-making and in some communities. FoCUS has been used to screen for left ventricular dysfunction, but most prior studies evaluated users with formal echocardiography training, whereas in real-world settings FoCUS is often performed by providers with limited training. This raises concerns about the accuracy and impact of FoCUS-derived LVEF on patient care. Artificial intelligence (AI) could help improve accuracy and standardization of FoCUS, yet many AI algorithms lack validation in real-world settings and have primarily been studied with images acquired by experienced echocardiographers. The study’s objective was to determine the diagnostic accuracy of AI-assisted FoCUS LVEF assessment compared with comprehensive TTE and to compare performance between novice and experienced users across real-world clinical environments.
Literature Review
Prior work shows FoCUS without AI can detect abnormal LVEF with pooled sensitivity and specificity of approximately 84% and 89%, respectively, but most studies involved users trained as echocardiographers and few classified severity of dysfunction. Early AI studies have largely been performed in controlled settings with experienced sonographers. A validation study with 100 participants showed AI-assisted FoCUS achieved a correlation coefficient of 0.87 and sensitivity/specificity of 90%/87% for LVEF <50%, though images were acquired in an echocardiography lab. Open-source tools such as EchoNet-based models in emergency settings have reported an AUC around 0.81 for reduced LV function, with some trained on physician visual assessments rather than quantitative biplane methods and not reporting severity categories. These limitations highlight the need for pragmatic, real-world evaluations of AI-assisted FoCUS across diverse users and settings, including the ability to grade severity of LV dysfunction.
Methodology
Design: Prospective, multicenter, observational cohort at The Ottawa Hospital and the University of Ottawa Heart Institute (Ottawa, Canada) and Tufts Medical Center (Boston, USA). Eligibility: Adults (≥18 years) undergoing clinically indicated TTE between September 2020 and March 2022. Recruitment: Convenience sampling from inpatient (emergency department, wards, ICU) and outpatient settings. FoCUS with AI-assisted LVEF was performed within 48 hours of TTE. Consent and ethics: Informed consent obtained; ethics approval by Ottawa Health Science Network Research Ethics Board and Tufts IRB. Operators: FoCUS performed by either novice or experienced users. Novice: deemed competent for FoCUS with fewer than 100 assessments. Experienced: at least 100 assessments and a minimum of 10 years’ experience in LV function assessment. Device and AI: EchoNous KOSMOS handheld 64-channel ultrasound system used. Workflow: acquire 5-second apical four-chamber (A4C) and apical two-chamber (A2C) clips; AI automatically identifies end-diastolic and end-systolic frames, traces LV endocardial borders, and computes LVEF using the modified Simpson’s biplane method of disks. Users could modify frames/tracings, but for the study AI outputs were left unmodified. The AI provided assistance only for interpretation, not acquisition. The device has FDA 510(k) clearance. Reference standard: Comprehensive TTE acquired by trained sonographers on cart-based systems and interpreted by a level-3 echocardiographer; LVEF calculated via biplane method with manual endocardial tracings in A4C and A2C views per ASE recommendations. Image quality and exclusions: Non-diagnostic FoCUS studies were excluded from analyses. If lateral decubitus positioning was not possible, modified supine or upright images were obtained. Outcomes and definitions: Primary outcome was agreement of AI-assisted FoCUS LVEF with TTE LVEF. LVEF categories: normal (≥50%), mild (40–49%), moderate (30–39/40%), severe (≤30%). Secondary outcomes included diagnostic accuracy for abnormal LVEF (<50%) and severe dysfunction (≤30%), and performance stratified by operator experience. Statistical analysis: Continuous variables summarized by mean±SD or median (IQR); categorical by counts and percentages. Agreement assessed with simple linear regression, intraclass correlation coefficient (ICC), and Bland–Altman analysis (bias and limits of agreement). Categorical agreement assessed with Cohen’s weighted kappa. ROC analyses evaluated identification of abnormal (<50%) and severe (≤30%) LVEF, reporting AUC, sensitivity, specificity, PPV, and NPV. Two-sided p<0.05 considered significant. Analyses conducted in SAS 9.4. Reporting followed STARD guidelines.
Key Findings
- Enrollment: 449 participants enrolled; 424 studies included after excluding 25 non-diagnostic FoCUS studies. Both novice (NS) and experienced (ES) groups contributed 216 studies each to the final analysis. - Baseline: Median age 65 years (IQR 20); 34% female; median BMI 27.1 kg/m² (IQR 6.8). Prior LV dysfunction present in 23.8%; LV dysfunction on current TTE in 29.6%. - Operator context: NS predominantly recruited from inpatient settings; ES from outpatient settings. - Agreement and correlation: Overall ICC 0.904 (excellent). NS ICC 0.921 (excellent); ES ICC 0.845 (good). Bland–Altman bias 0.73% towards TTE (p = 0.005) with level of agreement ~11.2%. Linear regression overall R² = 0.82 (RMSE 5.31, MAE 4.25, p < 0.0001); NS R² = 0.85 (RMSE 5.31, MAE 3.86); ES R² = 0.72 (RMSE 5.23, MAE 4.48); all p < 0.0001. - Categorical agreement: Weighted kappa 0.83 (CI ~0.76–0.91). Only 0.5% of cases differed by more than one severity category versus TTE. - Abnormal LVEF (<50%): AUC 0.98 (95% CI 0.96–0.99); sensitivity 92.8% (86.4–96.1), specificity 92.3% (88.5–95.0), NPV 0.97 (0.94–0.98), PPV 0.83 (0.76–0.89). - Severe dysfunction (≤30%): AUC 0.99 (0.98–1.00); sensitivity 78.1% (69.1–86.1), specificity 98.0% (95.9–99.0), NPV 0.98 (0.96–0.99), PPV 0.76 (0.57–0.88). - Subgroup categorical agreement: Weighted kappa NS 0.83 (0.77–0.88); ES 0.80 (0.72–0.88).
Discussion
AI-assisted FoCUS LVEF assessment demonstrated high agreement with comprehensive TTE across a large, multicenter cohort representative of real-world clinical practice. Performance was strong regardless of operator experience, indicating that novice users without formal echocardiography training can accurately identify the presence and severity of LV dysfunction using AI support. This suggests AI-assisted FoCUS can serve as a practical surrogate for formal TTE when rapid bedside LVEF assessment is needed, potentially expediting diagnosis and guiding management based on degree of LV impairment. Compared with traditional visually estimated FoCUS, which may be prone to misclassification, AI-derived measurements provide reproducible quantification and severity grading. The study’s pragmatic design, inclusion of diverse clinical settings, and international, multicenter recruitment enhance generalizability. Interestingly, diagnostic accuracy was numerically higher among novices than experienced users, possibly reflecting differences in case mix (inpatient vs. outpatient recruitment), selection bias, and image interpretability in more challenging windows managed by experienced users. The findings align with and extend prior evidence supporting AI in echocardiography by demonstrating robust performance in FoCUS outside controlled echolab environments and by validating severity classification.
Conclusion
AI-assisted FoCUS performed by both novice and experienced users can accurately quantify LVEF compared with comprehensive TTE and reliably classify severity of LV dysfunction. These results support the use of AI-assisted FoCUS as a rapid, accessible tool for bedside cardiac assessment across inpatient and outpatient settings. Future work should evaluate generalizability across different AI platforms and devices, assess integration of AI guidance for image acquisition, and compare against additional reference standards such as cardiac MRI, as well as measure clinical impact on decision-making and outcomes.
Limitations
- Convenience sampling may introduce selection bias, potentially underrepresenting critically ill patients requiring urgent expert assessment. - Findings are specific to the EchoNous KOSMOS platform; generalizability to other AI-assisted ultrasound systems may be limited. - AI was used only for interpretation, not image acquisition; performance may vary with acquisition quality and could differ with AI-guided acquisition tools. - Heterogeneity between novice and experienced cohorts (e.g., inpatient vs. outpatient recruitment) may confound comparisons, including differences in non-diagnostic study rates. - TTE, while the clinical reference standard, has inter-observer variability; more precise modalities (e.g., cardiac MRI) were not used for validation. - Non-diagnostic FoCUS studies were excluded, which may introduce bias in performance estimates.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny