logo
ResearchBunny Logo
Introduction
Cancers of Unknown Primary (CUP) site present a significant diagnostic challenge, accounting for 3-5% of all cancer diagnoses. These cancers, often adenocarcinoma, squamous, or undifferentiated carcinoma, are characterized by early dissemination, aggressive progression, and multiple organ involvement. Current diagnostic methods, including immunohistochemistry, are often insufficient, leading to poor prognoses (only 20% achieving a median survival of 10 months). A substantial portion of newly diagnosed CUP patients exhibit pleural or peritoneal metastasis, making cytological examination of serous effusions a crucial diagnostic tool. While pathologists can often distinguish adenocarcinoma from squamous carcinoma cytologically, pinpointing the tumor origin remains difficult. This study aimed to develop a deep-learning model to address this limitation, using cytological images to predict tumor origin in patients with hydrothorax or ascites metastasis. Deep convolutional neural networks have shown promise in various pathological diagnoses, demonstrating comparable performance to pathologists in certain applications, such as breast cancer metastasis detection and prostate cancer Gleason grading. However, deep learning models that utilize cytological imaging data to predict tumor origin are less common. This is largely because cytological examination suffers from suboptimal diagnostic accuracy due to sampling inadequacy, cellular degeneration, and inter-examiner variability. Therefore, an AI-based approach that aids in interpreting cytological data is highly desirable in improving diagnostic capabilities in this setting.
Literature Review
The existing literature highlights the challenges in diagnosing and treating CUP. Studies consistently report the poor prognosis associated with CUP, largely due to the difficulty in identifying the primary tumor site for targeted therapy. Immunohistochemistry has been employed, but its success rate is limited. Several studies have explored the use of AI in cancer diagnosis, with promising results in specific areas such as breast cancer and prostate cancer. However, the application of AI to cytological images for predicting tumor origin in CUP remains largely unexplored. Existing AI models primarily focus on histological or whole-slide images, with few reports demonstrating the capability of deep learning on cytological data to accurately predict tumor origin. The use of cytological images is clinically significant, particularly for advanced cancers where surgical or needle biopsies are not feasible. Thus, a robust model that utilizes the readily available cytological data to assist in localization of cancer origins and treatment decisions would be of considerable clinical value.
Methodology
The study utilized a large dataset of 57,220 cytological smear images from 43,688 patients across four tertiary hospitals. After excluding images lacking clinical or pathological evidence of origin and those of poor quality, the data were divided into training and testing sets. The training set comprised 29,883 images from 20,638 individuals, encompassing 12 tumor subtypes. Three internal testing sets and two external testing sets were used for validation. The model, termed TORCH, was developed by training four different deep neural networks (attention-based MIL, AbMIL with multiple attention branches, transformer-based MIL, and TransMIL with cross-modality attention) on three different input types: (1) cytological image features alone, (2) cytological image features plus age, sex, and specimen sampling site, and (3) a combination of cytological and histological features. The model outputs a probability for each of five categories: benign, digestive system, female reproductive system, respiratory system, and blood/lymphatic system. Model performance was evaluated using metrics such as AUROC, accuracy, sensitivity, specificity, and top-n accuracy. The performance of TORCH was compared with that of four pathologists (two senior, two junior), both with and without TORCH assistance. Survival analysis was conducted to assess the impact of treatment concordance with TORCH predictions on overall survival in CUP patients. Ablation studies were performed to investigate the contribution of clinical data to the model's performance. Attention heatmaps were generated to visualize the model's decision-making process and identify key histomorphological features.
Key Findings
TORCH demonstrated robust performance across five testing sets (n=27,337), achieving a microaveraged one-versus-rest AUROC of 0.969 for tumor origin prediction. Internal testing sets showed AUROC values ranging from 0.953 to 0.979, while external testing sets yielded AUROC values of 0.958 and 0.978. In cancer diagnosis, TORCH achieved an AUROC of 0.974, accuracy of 92.6%, sensitivity of 92.8%, and specificity of 92.4%. For tumor origin localization, TORCH achieved a top-1 accuracy of 82.6% and a top-3 accuracy of 98.9%. Compared to pathologists, TORCH exhibited significantly better prediction efficacy (diagnostic score 1.677 vs. 1.265, P<0.001), with junior pathologists showing significant improvement when assisted by TORCH (diagnostic score 1.326 vs. 1.101, P<0.001). CUP patients receiving treatment concordant with TORCH predictions had significantly better overall survival (median 27 vs. 17 months, P=0.006). Ablation studies showed that including clinical data significantly improved model performance. Attention heatmaps revealed that TORCH focused on relevant histomorphological features (glandular tubules, papillary clusters, cell size, cytoplasm, nuclear abnormalities).
Discussion
This study's findings demonstrate the potential of TORCH as a valuable ancillary tool for predicting tumor origin in CUP patients. The model's high accuracy and ability to improve pathologists' diagnostic performance, especially junior pathologists, are significant advancements. The observed correlation between treatment concordance with TORCH predictions and improved overall survival underscores the clinical relevance of the model. The integration of clinical data and cytological images enhanced the model's predictive capabilities, showcasing the importance of multimodal data fusion. However, the limitations of using only cytological images for precise localization, the exclusion of certain rare cancer types, and the focus on a specific geographic population need to be considered.
Conclusion
TORCH presents a promising deep-learning approach for improving the diagnosis and treatment of CUP. Its superior performance compared to human pathologists, particularly in aiding junior clinicians, and its association with better patient outcomes warrant further investigation. Future research should focus on expanding the model's capabilities to include a wider range of cancer types, incorporating additional data modalities (genomic data, radiologic imaging), and validating the findings in prospective, randomized controlled trials.
Limitations
The study's limitations include the use of cytological images (less information compared to whole-slide images), the exclusion of rare cancer types, the potential for geographic bias (Chinese population), and the limited sample size compared to natural image recognition tasks. The retrospective nature of the survival analysis also introduces potential biases. Further refinement of the model architecture, and incorporation of additional data modalities, are important avenues for future research.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny