logo
ResearchBunny Logo
Fast, accurate, and racially unbiased pan-cancer tumor-only variant calling with tabular machine learning

Medicine and Health

Fast, accurate, and racially unbiased pan-cancer tumor-only variant calling with tabular machine learning

R. T. Mclaughlin, M. Asthana, et al.

This groundbreaking study explores the use of machine learning to improve the accuracy of somatic mutation identification, enhancing tumor mutational burden estimates critical for immunotherapy response. Conducted by R. Tyler McLaughlin and colleagues, the research showcases state-of-the-art performance in separating somatic from germline variants, revolutionizing the field of precision oncology.

00:00
00:00
~3 min • Beginner • English
Abstract
Accurately identifying somatic mutations is essential for precision oncology and crucial for calculating tumor-mutational burden (TMB), an important predictor of response to immunotherapy. For tumor-only variant calling, accurately distinguishing somatic mutations from germline variants is challenging and often leads to unreliable, biased, and inflated TMB estimates. This study applies machine learning to classify somatic versus germline variants in tumor-only whole-exome sequencing samples using TabNet, XGBoost, and LightGBM. Training used features derived exclusively from tumor-only pipelines with truth labels from matched-normal analysis. All models achieved state-of-the-art performance on two holdout test datasets: TCGA samples (AUC > 94%) and a metastatic melanoma dataset (AUC > 85%). Concordance between matched-normal and tumor-only TMB improves from R² = 0.006 to 0.71–0.76 with machine-learning classification, with LightGBM performing best. The models generalize across cancer subtypes and capture kits, with a 100% call rate. The study reproduces the finding that tumor-only TMB is extremely inflated for Black patients due to racially biased germline databases and shows that XGBoost and LightGBM eliminate this significant racial bias.
Publisher
npj Precision Oncology
Published On
Jan 07, 2023
Authors
R. Tyler McLaughlin, Maansi Asthana, Marc Di Meo, Michele Ceccarelli, Howard J. Jacob, David L. Masica
Tags
somatic mutations
tumor mutational burden
machine learning
XGBoost
LightGBM
precision oncology
racial bias
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny