Introduction
Prostate cancer is a prevalent malignancy, and radical prostatectomy is a common curative treatment. Biochemical recurrence (BCR), indicated by rising PSA levels after surgery, signifies regrowth of cancer cells and predicts metastasis and death. Current risk assessment relies on the ISUP grade (Gleason grading system), PSA levels at diagnosis, and TNM staging. However, the ISUP grade has limitations: inter-observer variability and omission of subtle histopathological features. This study hypothesizes that deep learning can identify additional prognostic information within tissue morphology beyond the current grading system. Deep learning, a type of artificial intelligence, excels at discovering complex patterns in data. Several studies have explored deep learning in cancer prognosis, but often with manual feature selection or without direct optimization toward the recurrence outcome. This study aimed to develop a deep learning-based biomarker for predicting BCR directly from H&E-stained tissue sections, with the goal of improving risk stratification for patients and providing interpretable results for pathologists. The use of a nested case-control study design enables evaluation of the model's ability to learn patterns independently of Gleason patterns, which allows us to discover more fine-grained morphological features. The development and validation of this biomarker will contribute significantly to the field by potentially improving the accuracy of prostate cancer prognosis and patient management.
Literature Review
Existing methods for predicting biochemical recurrence of prostate cancer primarily rely on the ISUP grade, pre-operative PSA levels and TNM staging. However, the ISUP grading system suffers from limitations such as inter-observer variability and the omission of subtle histopathological features. Previous research has explored the application of deep learning in prostate cancer prognosis, with some studies using manual feature selection or classical regression techniques on deep learning encodings instead of directly training the deep learning model on the outcome, limiting the discovery of novel prognostic features. The black-box nature of deep learning models is a common concern, especially in medical applications. Therefore, methods for interpreting the learned features are crucial for increasing transparency and trust in the models. This study uses Automatic Concept Explanations (ACE) to address this limitation by providing visual representations of the tissue patterns that the model identifies as relevant for prediction.
Methodology
Two independent cohorts of patients who underwent radical prostatectomy were used: one from Johns Hopkins Hospital (development cohort) and another from New York Langone Medical Center (validation cohort). The Johns Hopkins cohort was used for biomarker development in a nested case-control study design, using tissue microarray (TMA) cores (0.6 mm diameter) from the highest-grade tumor nodule. For each case (patient with recurrence), a control (patient without recurrence at the time of the case's recurrence) was matched based on age, race, pathological stage, and Gleason sum. Images were scanned at high resolution, and cores with <25% tissue were discarded. The dataset was divided into a development set and a test set. The development set was split into three folds for cross-validation. The New York Langone cohort served as an independent validation set, with TMA cores sampled from the largest or highest-grade tumor foci. For model development, a ResNet50-D convolutional neural network (CNN) architecture pre-trained on ImageNet was used. Extensive data augmentation techniques were applied to improve model generalization. The model was trained to predict the time to biochemical recurrence (0-4 years), with patients without recurrence assigned the label 4. Model selection was based on the concordance index. The final DLS biomarker was an ensemble of 15 CNNs. Statistical analysis for the Johns Hopkins cohort included conditional logistic regression, adjusting for ISUP grade, PSA, surgical margins, and year of surgery to account for residual differences after matching. Cox proportional hazards regression was used for the New York Langone cohort, including ISUP grade, pathological stage, surgical margins, and preoperative PSA. Automatic Concept Explanations (ACE) were used to visualize the learned patterns by clustering patches of the input images based on their intermediate features. A pathologist visually inspected the resulting concepts. The source code was made available online.
Key Findings
The deep learning system (DLS) biomarker showed a strong association with biochemical recurrence in both cohorts. In the Johns Hopkins test set (nested case-control study), the DLS biomarker had an odds ratio (OR) of 3.28 (95% CI 1.73-6.23; p<0.005) per unit increase, adjusting for other covariates, this increased to an OR of 3.32 (CI 1.63-6.77; p=0.001). In the independent New York Langone validation cohort, the DLS biomarker showed a hazard ratio (HR) of 5.78 (95% CI 2.44-13.72; p<0.005) in univariable analysis and 3.02 (CI 1.10-8.29; p = 0.03) in multivariable analysis, adjusting for ISUP grade and other clinical factors. Kaplan-Meier curves showed a clear separation of low-risk and high-risk groups based on the DLS biomarker score. Automatic Concept Explanations revealed interpretable tissue patterns: concepts associated with early recurrence mainly showed Gleason pattern 4 and 5, with cribriform configurations, while those associated with later recurrence showed primarily Gleason pattern 3, with well-formed glands. This indicates that the DLS captures expected morphological patterns and identifies additional information beyond the ISUP grade.
Discussion
This study demonstrates the successful development and validation of a deep learning-based biomarker for predicting biochemical recurrence of prostate cancer. The findings indicate that the model can identify prognostic information from tissue morphology beyond that captured by the current ISUP grading system. The strong correlation between the DLS biomarker and recurrence in both independent cohorts supports the robustness and generalizability of the model. The use of ACE provided interpretable results, enhancing transparency and trust in the model's predictions. The ability to distinguish patients with relatively rapid recurrence from those without highlights the potential clinical utility of the biomarker for improved risk stratification and personalized treatment planning. The relatively shorter follow up time in the development cohort may limit the model's ability to differentiate patients with very late recurrence. This should be taken into consideration when interpreting long term results. Future work may also explore the use of other deep learning architectures, improved methods for concept extraction, or other data augmentation approaches.
Conclusion
This study presents a novel deep learning-based visual biomarker for predicting prostate cancer recurrence, offering additional prognostic information beyond the current ISUP grade. The biomarker's strong performance in both independent cohorts, along with the interpretable concepts derived from ACE, positions it as a promising tool for improving risk stratification in prostate cancer patients. Future research should focus on validating the biomarker on whole prostatectomy sections, investigating its correlation with time-to-metastases or death, and exploring its integration into clinical decision-making workflows.
Limitations
The study is limited by the use of TMA cores, which represent only a small sample of the entire tumor. This may not fully capture the heterogeneity of the tumor and potentially more aggressive patterns present outside the chosen regions. The nested case-control study design may have introduced some bias, although matching helped to mitigate this. The median follow-up time in the Johns Hopkins cohort (4 years) might limit the model's ability to accurately predict very late recurrences. The use of granular follow up information could be better leveraged with survival based loss functions. Furthermore, the availability of data on cribriform growth and intraductal carcinoma was limited in the multivariate analysis, potentially impacting the interpretation of the results.
Related Publications
Explore these studies to deepen your understanding of the subject.