Accurately identifying somatic mutations is crucial for precision oncology and tumor mutational burden (TMB) calculation, a key immunotherapy response predictor. Tumor-only variant calling, lacking matched normal tissue, faces challenges in distinguishing somatic from germline variants, leading to biased TMB estimates. This study applies machine learning (TabNet, XGBoost, LightGBM) to classify somatic vs. germline variants using tumor-only features and matched-normal labels. All three models achieved state-of-the-art performance (AUC > 94% on TCGA data, > 85% on melanoma data). Concordance between matched-normal and tumor-only TMB improved significantly (R² from 0.006 to 0.71-0.76), with LightGBM performing best. Importantly, XGBoost and LightGBM eliminated racial bias in tumor-only TMB estimates observed in previous studies.
Publisher
npj Precision Oncology
Published On
Jan 07, 2023
Authors
R. Tyler McLaughlin, Maansi Asthana, Marc Di Meo, Michele Ceccarelli, Howard J. Jacob, David L. Masica
Tags
somatic mutations
tumor mutational burden
machine learning
XGBoost
LightGBM
precision oncology
racial bias
Related Publications
Explore these studies to deepen your understanding of the subject.