This paper introduces ProtGPT2, a language model trained on protein sequences to generate novel protein sequences. The generated proteins exhibit natural amino acid propensities and are largely globular. Sequence searches indicate the model samples unexplored regions of protein space, and AlphaFold predictions show well-folded structures with complex topologies not found in current databases. ProtGPT2 generates sequences rapidly and is publicly available.
Publisher
Nature Communications
Published On
Jul 27, 2022
Authors
Noelia Ferruz, Steffen Schmidt, Birte Höcker
Tags
ProtGPT2
language model
protein sequences
novel proteins
AlphaFold
amino acid propensities
protein structure
Related Publications
Explore these studies to deepen your understanding of the subject.