logo
ResearchBunny Logo
Can Deep Learning Crack the Genetic Code? A Data-Driven Approach

Biology

Can Deep Learning Crack the Genetic Code? A Data-Driven Approach

M. Joiret, G. Gianini, et al.

This groundbreaking study explores how neural networks can crack the genetic code mapping between codons and amino acids, revealing that millions of codon-amino acid pairs are essential for high accuracy. Conducted by a team of experts, including Marc Joiret and Gabriele Gianini, this research highlights the potential of deep learning to efficiently learn biological complexities.

00:00
00:00
~3 min • Beginner • English
Abstract
The genetic code is textbook scientific knowledge established without AI. This study tests whether a neural network can re-discover, autonomously, the mapping between codons and amino acids and build the complete deciphering dictionary from transcript–protein training pairs. We compare Deep Learning architectures and quantitatively estimate the size of the required human transcriptomic training set to achieve the best possible accuracy in codon-to-amino-acid mapping. We investigate the effect of a codon embedding layer (semantic similarity between codons) on the training accuracy rate, and the benefit of using the unbalanced representations of amino acids in human proteins to speed deciphering of rare amino-acid codons. Deep neural networks require large datasets; deciphering the genetic code is no exception. Achieving high test accuracy and unequivocally identifying rare codons (e.g., tryptophan and stop codons) requires on the order of millions of cumulative codon–amino-acid pairs presented over tens of epochs, depending on architecture and settings. We confirm that the generic capacities and modularity of deep neural networks allow efficient customization to learn the genetic code deciphering task.
Publisher
Frontiers in Artificial Intelligence
Published On
Apr 27, 2023
Authors
Marc Joiret, Gabriele Gianini, Rocco Zaccagnino, Antonio José Jimeno Yepes, Giulia Lorini, Giulia Lalli, Francesca Rossi, Pierre Collignon, Luc Leybaert, Matteo Leoni
Tags
neural networks
genetic code
codons
amino acids
deep learning
training dataset
bioinformatics
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny