logo
ResearchBunny Logo
StratoMod: predicting sequencing and variant calling errors with interpretable machine learning

Computer Science

StratoMod: predicting sequencing and variant calling errors with interpretable machine learning

N. Dwarshuis, P. Tonner, et al.

Discover StratoMod, an innovative interpretable machine-learning classifier developed by Nathan Dwarshuis, Peter Tonner, Nathan D. Olson, Fritz J. Sedlazeck, Justin Wagner, and Justin M. Zook. This groundbreaking research predicts germline variant calling errors by utilizing genomic context features, revolutionizing the assessment of variant calling accuracy and enhancing identification of clinically relevant variants.

00:00
00:00
~3 min • Beginner • English
Abstract
Despite the variety in sequencing platforms, mappers, and variant callers, no single pipeline is optimal across the entire human genome, so users make tradeoffs when designing pipelines for specific applications. Currently, assessing such tradeoffs relies on intuition about performance in given genomic contexts. We present StratoMod, an interpretable machine-learning classifier that predicts germline variant calling errors in a data-driven manner. StratoMod precisely predicts recall using HiFi or Illumina data and, through its interpretability, quantifies contributions from difficult-to-map and homopolymer regions to outcomes. We use StratoMod to assess effects of mismapping on predicted recall for linear vs graph-based references and identify hard-to-map regions where graph-based methods excel and by how much, leveraging a draft benchmark based on the Q100 HG002 assembly that includes previously inaccessible difficult regions. StratoMod also predicts clinically relevant variants likely to be missed, improving over pipelines that primarily filter likely false positives. This enables precise risk-reward analyses when designing variant calling pipelines.
Publisher
communications biology
Published On
Oct 13, 2024
Authors
Nathan Dwarshuis, Peter Tonner, Nathan D. Olson, Fritz J. Sedlazeck, Justin Wagner, Justin M. Zook
Tags
machine learning
germline variant calling
predictive modeling
genomic context
clinically relevant variants
sequencing data
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny