
Medicine and Health
Enhancing the diagnosis of functionally relevant coronary artery disease with machine learning
C. Bock, J. E. Walter, et al.
This paper reveals how machine learning (ML) can enhance the diagnosis of functionally relevant coronary artery disease (fCAD), outperforming cardiologists and potentially reducing unnecessary imaging procedures. The innovative approaches presented by Christian Bock, Joan Elias Walter, Bastian Rieck, and colleagues could shape the future of cardiac healthcare.
Playback language: English
Introduction
Coronary artery disease (CAD) is a leading cause of death globally. Functionally relevant CAD (fCAD) specifically causes myocardial ischemia, impacting quality of life and potentially leading to serious cardiac events. Early detection of fCAD is crucial, but current methods like exercise electrocardiography (ECG) stress testing have limitations in diagnostic accuracy, while more advanced imaging techniques (myocardial perfusion imaging (MPI), coronary computed tomography angiography) are invasive and costly. This study explores the potential of machine learning (ML) to address this unmet clinical need by providing a more accurate, efficient, and safer method for fCAD detection. Traditional methods for automated cardiac event detection rely on ECG signal delineation, which can be inaccurate for abnormal heartbeats. Deep learning (DL), however, offers the potential to analyze ECG signals without requiring engineered features, achieving cardiologist-level accuracy in other cardiac applications. While DL has been explored in cardiac stress testing, previous studies have limitations such as using excessive variables hindering transferability, reliance on inaccurate outcome definitions, and a lack of external validation. This research aims to develop and validate two ML models: one using readily available clinical data and another employing ECG signals from stress testing. These models are compared to cardiologists' assessments, and their performance is examined in various subcohorts to assess their generalizability and robustness. A combined approach, integrating ML and cardiologist judgment, is also explored to determine if the two can complement each other.
Literature Review
The literature reveals a long history of automated cardiac event detection, with traditional methods often relying on manual interpretation of ECG changes like ST-segment deviations. However, these methods suffer from limitations in accuracy due to the reliance on ECG delineation algorithms, which are susceptible to error in abnormal heartbeats. Recent years have witnessed a rise in the use of deep learning for ECG analysis, offering the potential to overcome these limitations. Deep learning models have demonstrated success in detecting various cardiac arrhythmias, achieving levels of accuracy comparable to cardiologists. While the potential of deep learning has been investigated in cardiac stress testing, prior studies have limitations in terms of variable number, reliance on automated ECG delineation and outcome definitions, comprehensive performance evaluations, and external validation. These shortcomings highlight a need for robust and validated machine learning models for fCAD detection. This study addresses those shortcomings with more sophisticated model construction, testing, and validation techniques.
Methodology
This study uses data from the BASEL VIII study (NCT01838148), including stress test ECG data from 3522 patients who underwent a standard rest/stress myocardial perfusion single-photon emission computed tomography (SPECT) protocol. Patients were referred with symptoms suggesting inducible myocardial ischemia. Data included stress test ECGs, myocardial perfusion scans (rest and stress), and cardiologists' pre- and post-test probability assessments of fCAD using a visual analogue scale (VAS). The data was split into a development set (75%) and a held-out test set (25%), with temporal separation to assess model robustness. External validation was performed using data from two Israeli medical centers (916 patients). Two ML models were developed: CARPEclin, an ensemble learning model using eight static clinical variables (age, weight, sex, height, resting heart rate, systolic and diastolic blood pressure, previous CAD history), and CARPEECG, a deep learning model using the clinical variables and ECG signals from stress testing. CARPEECG employed a multi-task learning architecture with residual layers, trained on auxiliary tasks (MPSSR and MPSSS scores, pharmacological support) to improve fCAD prediction performance. The model's performance was compared to cardiologists' post-test judgment and an ST-segment depression algorithm. A combined approach (CARPEColl) was also developed, combining the ML models' predictions with the cardiologists' post-test judgment using logistic regression. Model evaluation included receiver operating characteristic (ROC) curves, precision-recall curves, decision curve analysis, and subcohort analyses to assess performance in different patient groups (age, sex, CAD history, pharmacological vs. exercise stress test). SHAP (SHapley Additive exPlanation) values were used for model interpretability. The external validation focused on evaluating the models' generalizability across different institutions, modalities (treadmill vs. bicycle ergometry), and patient populations. ECG signals were preprocessed through downsampling, smoothing, outlier removal, and phase sampling and concatenation to create 2-6-2 sequences (2s pre-stress, 6s stress, 2s recovery). The best-performing conventional machine learning model was selected through 5-fold cross-validation.
Key Findings
The study's key findings demonstrate that both ML models significantly outperform the cardiologists' assessments in predicting fCAD. CARPEECG (AUROC 0.71) and CARPEclin (AUROC 0.70) surpassed the cardiologist's AUROC of 0.64 (p = 4.0E-13). Decision curve analysis showed that the ML models consistently yielded higher net benefit across various probability thresholds compared to the cardiologists' judgment, particularly at thresholds relevant to clinical decision-making (5-15%). At a 15% threshold, CARPEECG demonstrated a potential reduction in myocardial perfusion imaging (MPI) by 15.3% without increasing false negatives. CARPEColl, combining both ML models and cardiologists' judgment, further increased the potential reduction to 17.3%. Subcohort analysis revealed that ML models perform particularly well in younger patients and those without a CAD history, where the advantage over the cardiologists' assessment is most pronounced. In older patients or those with a CAD history, the combined approach (CARPEColl) outperformed both individual ML models and the cardiologist's judgment. SHAP value analysis demonstrated that the conventional ML model strongly relied on age and CAD history, while the deep learning model leveraged information from ECG signals more effectively, explaining its better generalizability. The external validation confirmed the deep learning model's superior performance and robustness across different institutions and testing modalities. The deep learning model outperformed the conventional ML model, highlighting the added value of ECG information in this application. In older patients or those with pre-existing CAD, the combined model, CARPEColl, demonstrated improved performance, suggesting synergistic value of expert clinicians and ML algorithms.
Discussion
This study demonstrates the potential of ML to significantly improve the diagnosis of fCAD. Both developed ML models consistently outperformed cardiologists' assessments, potentially reducing unnecessary and costly MPI procedures. The superior performance of the deep learning model, especially in younger patients, highlights the value of leveraging raw ECG data. The combined approach, CARPEColl, further enhances accuracy, suggesting a synergistic effect of integrating ML with clinical expertise. The results have significant implications for clinical practice, offering a potentially more accurate, efficient, and cost-effective approach to fCAD risk stratification. Future research could focus on validating this model in larger, more diverse populations and evaluating the actual clinical impact on patient outcomes. Exploration of attention-based neural networks and alternative ensemble methods could further improve predictive performance. The integration of interpretability tools, like SHAP values, into clinical workflows may facilitate better understanding and acceptance of these models by clinicians. Integrating such explainable AI features into clinical practice could improve decision-making in real-world clinical settings.
Conclusion
This research demonstrates the potential of machine learning, particularly deep learning, to significantly improve the accuracy and efficiency of fCAD diagnosis. Both proposed models outperform cardiologists' assessment, with the deep learning model showing particular promise due to its superior generalizability. A combined model that leverages both machine learning and clinical judgment results in further gains in predictive power. Future work should focus on large-scale clinical trials to validate the clinical utility of these models and their impact on patient care.
Limitations
While the study employed a rigorous methodology, several limitations should be considered. The expert interpretation of fCAD was not blinded to clinical and ECG data which could lead to overestimation of their influence in the model. The study population primarily consisted of symptomatic patients referred to a tertiary care center, potentially limiting generalizability to other populations. Women and certain ethnic groups were underrepresented, and the results may not be fully applicable to very young patients. Finally, although logistic regression was used to combine ML and physician judgment, the knowledge gap concerning the influence of clinical judgment on the final model’s output may affect real-world applicability.
Related Publications
Explore these studies to deepen your understanding of the subject.