Medicine and Health

A foundation model for clinical-grade computational pathology and rare cancers detection

E. Vorontsov, A. Bozkurt, et al.

Discover Virchow, a groundbreaking foundation model developed by a team of experts including Eugene Vorontsov and Kristen Severson, that excels in pan-cancer detection. Trained with vast data from over 100,000 patients, it achieves outstanding accuracy with rare cancer variants, paving the way for innovative applications in computational pathology.

00:00

Playback language: English

Index

Introduction

Pathologic analysis of tissue is crucial for cancer diagnosis and treatment. The increasing use of digital whole-slide images (WSIs) enables computational pathology, applying artificial intelligence (AI) to support diagnosis and disease understanding. While initial work focused on clinical decision support tools, recent advancements in computer vision, particularly the development of large-scale foundation models, offer the potential to unlock new insights from routine WSIs. Foundation models, trained on massive datasets using self-supervised learning, generate data representations (embeddings) that generalize well to diverse tasks. This contrasts with current diagnostic-specific methods, which are limited by smaller datasets and less likely to reflect the full spectrum of tissue variations. The advantages are particularly significant for applications with limited data, such as rare cancer detection. A successful pathology foundation model should capture a broad range of patterns in WSIs, facilitating prediction of various WSI characteristics. This study aims to create such a model, capable of robustly predicting both common and rare cancers, and performing other critical tasks. The performance of foundation models is heavily influenced by dataset and model size, with modern models in natural image domains using millions of images and billions of parameters. While the pathology domain faces challenges in data collection, recent studies have shown promising results with foundation models trained on thousands to hundreds of thousands of WSIs. This study aims to significantly increase the scale of training data to improve performance.

Literature Review

The literature review section details previous research in computational pathology, highlighting the transition from primarily academic proof points to routine clinical tools. It discusses the development of clinical decision support systems and the first FDA-approved AI pathology system. The review then focuses on recent studies attempting to leverage AI for broader applications beyond diagnosis, including prognosis and therapeutic response prediction. It emphasizes the potential to reduce reliance on specialized and expensive tests like immunohistochemistry (IHC) and genomic testing. The review also explores the emergence of foundation models in computer vision, their application to large-scale datasets, and the use of self-supervised learning techniques to generate generalized embeddings. Several pioneering works on pathology foundation models, utilizing datasets ranging from 30,000 to 400,000 WSIs, are discussed, demonstrating the benefits of self-supervised learning and scaling effects on performance.

Methodology

This study introduces Virchow, a pathology foundation model trained on a massive dataset of 1.5 million H&E-stained WSIs from approximately 100,000 patients at MSKCC. This dataset, 4-10 times larger than previous pathology datasets, comprises cancerous and benign tissues from 17 high-level tissue groups, collected via biopsy and resection. Virchow uses a 632-million parameter ViT model trained with the DINO v2 self-supervised learning algorithm. DINO v2 leverages global and local regions of tissue tiles to learn embeddings, which can then be aggregated to predict slide-level attributes. The study evaluates Virchow's performance on two key applications: pan-cancer detection and biomarker prediction. For pan-cancer detection, a weakly supervised aggregator model uses Virchow embeddings to predict specimen-level cancer across various tissues, including rare cancers. The performance is compared to three specialized clinical-grade AI products (Paige Prostate, Paige Breast, and Paige Breast Lymph Node) using both product testing datasets and rare cancer variant datasets. For biomarker prediction, the model predicts the status of nine biomarkers from routine H&E stained images, reducing reliance on additional testing. Finally, tile-level benchmarks using linear probing are performed to assess the quality and generalizability of Virchow embeddings. Unsupervised feature analysis is conducted to explore the semantic meaningfulness of the learned features.

Key Findings

Virchow significantly outperforms baseline models in pan-cancer detection, achieving an overall AUC of 0.95. Its performance is particularly strong on rare cancers (AUC of 0.937), demonstrating excellent generalization capabilities. Compared to three commercial clinical-grade AI products, the Virchow-based pan-cancer model shows comparable overall performance, sometimes even surpassing them in detecting rare cancer variants. The pan-cancer model demonstrates robustness to out-of-distribution (OOD) data from external institutions. Tile-level benchmarks reveal that Virchow embeddings match or exceed the performance of other models on various tasks, including OOD benchmarks. Unsupervised feature analysis suggests that Virchow's learned features are semantically meaningful, capturing distinct cell types. Biomarker prediction using Virchow embeddings outperforms other models for several biomarkers, indicating potential to reduce the need for additional, more expensive testing. Detailed analysis of error patterns reveals challenges related to minimal cancer presence, borderline malignancy, subtle malignant features, and artifacts.

Discussion

The results demonstrate the two key benefits of a pathology foundation model: generalizability and data efficiency. Virchow exhibits strong generalization to unseen tissue types, institutions, and rare cancer subtypes. Its ability to achieve near clinical-grade performance with less tissue-specific data highlights its potential to accelerate development of diagnostic tools for less common cancers. Biomarker prediction from routine H&E images offers the potential to improve screening rates and reduce the need for invasive testing. The study's findings suggest that large-scale foundation models can serve as robust building blocks for a wide array of downstream tasks in computational pathology. Further research could explore optimizing algorithms and training settings for the specific characteristics of pathology data.

Conclusion

This study presents Virchow, a large-scale foundation model for computational pathology, demonstrating high performance in pan-cancer detection and biomarker prediction. The model's generalizability and data efficiency suggest its potential for accelerating development of clinically relevant tools, particularly for rare cancers and biomarker analysis. Future work should focus on enhancing model architecture, exploring more sophisticated aggregation methods, and addressing the long-tailed distribution challenges inherent in pathology datasets.

Limitations

The study's limitations include the use of a training dataset acquired from a single center with limited scanner types. The model's reliance on tile-level embeddings necessitates the use of an aggregation model for slide-level predictions. The effect of data balancing and distillation strategies was not fully explored due to the scale of training. The study's analysis of rare cancer subtypes might be limited by the small number of samples.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Interdisciplinary Studies

A deep-learning model for predictive archaeology and archaeological community detection

A. Resler, R. Yeshurun, et al.

Psychology

Physical Activity for Cognitive Health: A Model for Intervention Design for People Experiencing Cognitive Concerns and Symptoms of Depression or Anxiety

B. Stubbs, D. Vancampfort, et al.

Medicine and Health

A comparative study of COVID-19 transcriptional signatures between clinical samples and preclinical cell models in the search for disease master regulators and drug repositioning candidates

H. Chapola, M. A. D. Bastiani, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny