logo
Loading...
VeChat: correcting errors in long reads using variation graphs

Computer Science

VeChat: correcting errors in long reads using variation graphs

X. Luo, X. Kang, et al.

VeChat, developed by Xiao Luo, Xiongbin Kang, and Alexander Schönhuth, is revolutionizing long-read sequencing data correction by overcoming biases introduced by consensus sequences. This groundbreaking open-source tool significantly reduces errors across various platforms, enhancing genome assembly quality.... show more
Abstract
Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat.
Publisher
Nature Communications
Published On
Nov 04, 2022
Authors
Xiao Luo, Xiongbin Kang, Alexander Schönhuth
Tags
VeChat
long-read sequencing
genome assembly
variation graphs
error correction
PacBio
ONT
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny