Introduction
Digital pathology, the digital version of conventional microscopy, is rapidly gaining traction, leading to the creation of large whole-slide image (WSI) databases. Artificial intelligence (AI), particularly deep learning, offers powerful pattern recognition capabilities for analyzing these databases. While AI-driven classification and segmentation methods are beneficial, image search represents a significant advancement. Content-based image search (CBIR), using image pixels as input, offers an unsupervised approach to find similar images and corresponding metadata, providing pathologists with decision support. This approach aims to address the well-known inter- and intra-observer variability in medical image interpretation, which can lead to diagnostic inaccuracies. Current AI solutions mainly focus on classification, which doesn't inherently support consensus building. In contrast, retrieving visually similar diagnosed cases provides a new form of decision support, essentially enabling "virtual" peer review. CBIR systems have been explored for decades, but only recently with the advent of digital pathology and deep learning has research focused on image search within histopathology. This paper presents a validation study of Yottixel, an image search engine for pathology, utilizing the TCGA dataset.
Literature Review
The introduction extensively reviews the literature on deep learning in digital pathology, highlighting the success of supervised AI for classification and segmentation. It contrasts this with the novelty of AI-based image search and retrieval as a new approach in computational pathology. The authors discuss the existing CBIR systems in medical applications and the challenges posed by inter- and intra-observer variability in diagnosis. They cite studies emphasizing high rates of diagnostic inaccuracy due to discordance among physicians. The limitations of AI-driven classification methods in addressing consensus building are noted, leading to the proposal of image search as a decision support tool for "virtual" peer review.
Methodology
The study used the TCGA repository, processing nearly 30,000 WSIs (16 terabytes of data) comprising 20 million 1000x1000 pixel image patches. High-performance storage and GPU power were utilized. The Yottixel search engine, employing clustering techniques, deep networks, and gradient barcoding, was used. The engine generates "bunch of barcodes" (BoB) for each WSI, converting tissue patterns into a computationally efficient index. Two types of experiments were conducted: horizontal search (comparing query WSIs against all cases regardless of anatomic site) and vertical search (comparing within a specific anatomic site to identify cancer subtype). A "leave-one-patient-out" approach was used to avoid bias. Accuracy was assessed using "majority voting", considering a search successful only if the majority of top-n results were correct. The results from both horizontal and vertical searches are reported, including detailed accuracy and recall values for various top-n searches (e.g., top-3, top-5, top-10) and majority-n votes (e.g., majority-5, majority-10). Additional analyses included t-SNE visualization of search results and heatmap analysis of the confusion matrix to explore similarities and patterns.
Key Findings
The horizontal search (cancer-type recognition) showed high accuracy values, with majority-vote accuracy generally increasing as more results were considered. For example, in frozen sections, majority-10 accuracy ranged from 86.42% (brain) to 49.17% (hematopoietic), while for permanent slides, majority-10 accuracy ranged from 94.8% (brain) to 61.09% (hematopoietic). The vertical search (cancer subtype identification) also demonstrated high majority-vote accuracy values for both frozen section and permanent slides, with accuracies exceeding 90% for several cancer subtypes (e.g., KIRC, GBM, COAD, UCEC, PCPG). A strong positive correlation was observed between the number of diagnosed WSIs and consensus accuracy. The t-SNE visualization demonstrated the grouping of similar subtypes, while the heatmap analysis identified specific subtypes where misclassifications were more frequent (e.g., MESO, READ/COAD, LUAD/LUSC). A chord diagram visually represented the relationships between different cancer types based on search results, revealing unexpected similarities between certain tumor subtypes.
Discussion
The results support the feasibility of using AI-based image search to build computational consensus for cancer subtype diagnosis. The high accuracy values, particularly using majority voting, suggest that "virtual peer review" is achievable with sufficiently large and well-characterized datasets. The positive correlation between the number of patients and accuracy highlights the importance of large datasets. The analysis of the confusion matrix and chord diagram provide insights into the strengths and limitations of the approach. While the approach is effective, the limitations of the TCGA dataset, such as the inclusion of frozen sections and underrepresentation of hematopathology, were acknowledged. The BoB indexing method makes the image search engine computationally efficient and practical for implementation.
Conclusion
This study demonstrates the potential of AI-powered image search for improving diagnostic accuracy in pathology. The high accuracy achieved in identifying cancer subtypes using a conservative majority voting approach suggests that this technology can provide valuable decision support for pathologists. Future research should focus on more detailed subtype consensus analysis using larger and curated datasets, including hematological specimens, and directly measure the impact on inter- and intra-observer variability.
Limitations
The study's reliance on the TCGA dataset, which has limitations in terms of representation of various cancer types and use of frozen sections, should be considered. The study also primarily focused on visually similar tumors, overlooking other diagnostic aspects. The assessment of similarity may not perfectly align with how pathologists perceive it. Additionally, the study acknowledges the need for a more comprehensive assessment of inter- and intra-observer variability reduction and evaluation of financial and intellectual property implications.
Related Publications
Explore these studies to deepen your understanding of the subject.