logo
ResearchBunny Logo
Introduction
Understanding viral host ranges is crucial for predicting and mitigating zoonotic diseases. Current knowledge is heavily biased toward humans and domesticated animals, leaving a significant gap in understanding the associations between viruses and wild mammals. This study aims to address this gap using a novel machine-learning framework that integrates three different perspectives: viral traits, mammalian traits, and network topology. The framework leverages the existing knowledge of virus-mammal associations to predict potential or undocumented associations, thereby providing a more complete picture of viral host ranges and identifying potential risks. The importance of this research stems from the escalating global burden of viral diseases and the need for effective strategies to mitigate the risk of future zoonotic spillover events. The researchers hypothesize that by integrating these three perspectives, the predictive power of their model will be significantly enhanced compared to models using only one or two perspectives.
Literature Review
The researchers reviewed existing literature on viral host ranges, highlighting the significant bias towards humans and domesticated animals. They cited studies emphasizing the limited knowledge of viral diversity in mammals, particularly in wild species. The review also covered existing methods for predicting host-pathogen interactions, including network-based approaches that utilize topological features to identify missing links. Studies on the use of motifs in network analysis to understand complex interactions in biological systems were also discussed. The review underscores the need for a more comprehensive approach that accounts for both individual host and viral traits, along with global network properties, to improve the accuracy of predictions.
Methodology
The researchers developed a multi-perspective machine-learning framework to predict unknown virus-mammal associations. The framework consisted of three perspectives: 1. **Mammalian perspective:** This perspective used viral traits (e.g., genome type, transmission routes) to predict the probability of a virus associating with a given mammal. A suite of machine-learning models (avNNet, GBM, Random Forest, XGBoost, SVM-RW, SVM-LW, SVM-P, Naive Bayes) were trained for each mammal with at least two known viruses. SMOTE (Synthetic Minority Over-sampling Technique) was employed to address class imbalance. 2. **Viral perspective:** This perspective used mammalian traits (e.g., phylogeny, ecology, geography) to predict the probability of a mammal hosting a given virus. Similar machine learning models and class balancing techniques were used as in the mammalian perspective, with models trained for each virus with at least two known mammalian hosts. 3. **Network perspective:** This perspective utilized network topology to predict associations. The researchers used counts of potential motifs (small subgraphs representing recurring patterns in the network) as features. A similar suite of machine-learning models was employed, with balanced data obtained via under-sampling. The predictions from the three perspectives were consolidated using majority voting: an association was predicted if at least two perspectives supported it. The researchers used various performance metrics (AUC, TSS, F1-score) to evaluate the models and selected the best-performing ones for each perspective. Research effort (number of sequences and publications) on mammal and virus species was also incorporated into the models, particularly in the network perspective, to account for potential biases in data availability.
Key Findings
The framework predicted a median of 20,832 unknown virus-mammal associations (90% CI [2,736, 97,062]), representing a 4.29-fold increase over the known associations. The increase was more pronounced for wild and semi-domesticated mammals (4.89-fold increase). Individual perspectives varied in their predictive power: mammalian perspective predicted 41,537, viral perspective predicted 21,352, and network perspective predicted 76,081 unknown associations. A separate analysis using only sequence-supported associations predicted 15,721 unknown associations. The average mammalian host range per virus was estimated to be 14.33 (90% CI [4.78, 54.53]), with RNA viruses exhibiting a wider range (21.65) than DNA viruses (7.85). The framework highlighted significant knowledge gaps for wild reservoirs of zoonotic viruses and viruses affecting domesticated animals, especially lyssaviruses, bornaviruses, and rotaviruses. The analysis of feature importance showed that phylogenetic and ecological distances between hosts were the most influential predictors in mammalian and viral perspectives. In the network perspective, certain motif features, indicating connections between viruses with wide host ranges and mammals with diverse viral communities, were highly important. Model validation using held-out test sets, removal of known associations, and comparison with external data demonstrated strong predictive performance (AUC = 0.938).
Discussion
The findings of this study indicate a substantial underestimation of the true number of virus-mammal associations, especially in wild mammals. The multi-perspective approach proved superior to individual perspectives, enhancing predictive power and capturing both local and global variations in knowledge gaps. The identified knowledge gaps highlight the need for increased surveillance efforts, particularly for wild reservoirs of zoonotic viruses. The framework's ability to predict associations at the level of individual species makes it suitable for targeted surveillance strategies. The integration of viral and mammalian traits with network topology provides a powerful approach for understanding the complex interplay between viruses and their hosts, contributing significantly to predicting future zoonotic spillover events.
Conclusion
This study presents a novel machine-learning framework for predicting unknown virus-mammal associations. The framework's multi-perspective approach significantly improves predictive accuracy, highlighting substantial underestimation of current knowledge. The results emphasize the need for targeted surveillance of wild and semi-domesticated mammals, particularly for understudied virus genera. Future research could focus on incorporating additional viral traits, including receptor binding information, and expanding the network to include other host species, such as birds, to further enhance predictive power and understanding.
Limitations
The study acknowledges limitations related to research biases and the challenge of distinguishing between true biological non-susceptibility and lack of observation due to research effort. The framework focuses on predicting known unknowns; it cannot predict associations involving completely novel viruses. The model's performance might be influenced by the completeness and accuracy of the underlying virus-mammal association database.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny