logo
ResearchBunny Logo
MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets

Biology

MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets

D. Vik, B. Bolduc, et al.

Exciting advancements in our understanding of archaeal viruses are here! Meet MArVD2, the cutting-edge machine learning tool developed by researchers including Dean Vik and Benjamin Bolduc, which correctly classifies 85% of archaeal viruses with remarkably low false detection rates. Discover how this innovative tool pushes the boundaries of viral sequence analysis!

00:00
00:00
~3 min • Beginner • English
Abstract
Our knowledge of viral sequence space has exploded with advancing sequencing technologies and large-scale sampling and analytical efforts. Though archaea are important and abundant prokaryotes in many systems, our knowledge of archaeal viruses outside of extreme environments is limited. This largely stems from the lack of a robust, high-throughput, and systematic way to distinguish between bacterial and archaeal viruses in datasets of curated viruses. Here we upgrade our prior text-based tool (MArVD) via training and testing a random forest machine learning algorithm against a newly curated dataset of archaeal viruses. After optimization, MArVD2 presented a significant improvement over its predecessor in terms of scalability, usability, and flexibility, and will allow user-defined custom training datasets as archaeal virus discovery progresses. Benchmarking showed that a model trained with viral sequences from the hypersaline, marine, and hot spring environments correctly classified 85% of the archaeal viruses with a false detection rate below 2% using a random forest prediction threshold of 80% in a separate benchmarking dataset from the same habitats.
Publisher
ISME Communications
Published On
Aug 24, 2023
Authors
Dean Vik, Benjamin Bolduc, Simon Roux, Christine L. Sun, Akbar Adjie Pratama, Mart Krupovic, Matthew B. Sullivan
Tags
archaeal viruses
machine learning
random forest
scalability
classification
benchmarking
sequencing technologies
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny