logo
ResearchBunny Logo
ProtGPT2 is a deep unsupervised language model for protein design

Biology

ProtGPT2 is a deep unsupervised language model for protein design

N. Ferruz, S. Schmidt, et al.

Discover ProtGPT2, a groundbreaking language model developed by Noelia Ferruz, Steffen Schmidt, and Birte Höcker, that generates innovative protein sequences. These novel sequences maintain natural amino acid preferences and emerge from previously unexplored protein spaces, yielding well-folded structures of unique complexities. Rapid generation and public accessibility make this a significant advancement in protein research.

00:00
00:00
Playback language: English
Abstract
This paper introduces ProtGPT2, a language model trained on protein sequences to generate novel protein sequences. The generated proteins exhibit natural amino acid propensities and are largely globular. Sequence searches indicate the model samples unexplored regions of protein space, and AlphaFold predictions show well-folded structures with complex topologies not found in current databases. ProtGPT2 generates sequences rapidly and is publicly available.
Publisher
Nature Communications
Published On
Jul 27, 2022
Authors
Noelia Ferruz, Steffen Schmidt, Birte Höcker
Tags
ProtGPT2
language model
protein sequences
novel proteins
AlphaFold
amino acid propensities
protein structure
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny