logo
ResearchBunny Logo
Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning

Biology

Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning

A. Kroll, Y. Rousset, et al.

Discover TurNuP, a groundbreaking model developed by Alexander Kroll, Yvan Rousset, Xiao-Pan Hu, Nina A. Liebrand, and Martin J. Lercher that predicts enzyme turnover numbers (kcat) with remarkable generalizability. This innovative tool integrates reaction fingerprints with advanced protein sequence analysis, enhancing metabolic model predictions and providing easy web access to vital biochemical insights.

00:00
00:00
~3 min • Beginner • English
Introduction
Turnover number (kcat) denotes the maximal catalytic rate at an enzyme’s active site and is a key parameter for quantitative studies of enzymatic activity, metabolism, physiology, and cellular resource allocation. Comprehensive kcat sets are essential for enzyme- and proteome-constrained genome-scale metabolic models, but high-throughput assays do not exist and measurements are time-consuming and costly; even in E. coli, in vitro kcat is known for only about 10% of reactions. Existing strategies (sampling or fitting kcat in large models) often yield inaccurate values that poorly correspond to measured in vitro data. Recent approaches estimate in vivo kcat for well-studied organisms or use machine learning from limited feature sets, but they are constrained by data availability or generalization to novel enzymes. The research goal here is to develop an accurate, general, organism-independent computational model to predict in vitro kcat for natural reactions of wild-type enzymes using only broadly available inputs, and to demonstrate its utility for metabolic model parameterization and proteome allocation prediction.
Literature Review
Prior work includes: (1) David et al. estimated enzyme kcat in E. coli from fluxes and proteomics across 31 conditions, achieving r ≈ 0.62 (log10), but limited to a well-studied organism and a few hundred enzymes. (2) Heckmann et al. built a kcat predictor for E. coli using expert-crafted features (active site properties, metabolite concentrations, experimental conditions, and FBA-derived fluxes), achieving R² ≈ 0.34 on a small E. coli-only dataset; applicability is limited since many features are rarely available. (3) DLKcat (Li et al.) used CNN-based enzyme sequence features and a single substrate to predict kcat across reactions; while broadly applicable, its performance degrades for enzymes dissimilar to the training set and it omits products/co-substrates. Collectively, these studies highlight the need for organism-independent models using generalizable representations that capture full reaction context and generalize to low-similarity enzymes.
Methodology
Data compilation and preprocessing: kcat measurements and associated enzyme sequences and reactions were aggregated from BRENDA, UniProt, and Sabio-RK. Non-wild-type enzymes and non-natural reactions were removed. Redundancies and entries with incomplete reaction/enzyme information were excluded. Unrealistic outliers (<1e-15 s−1 or >1e5 s−1) were removed. For enzyme–reaction pairs with multiple measurements, the geometric mean was taken after excluding very low values (<1% of the maximum for the same pair) to mitigate non-optimal conditions. Further filters removed potentially misassigned reactions (mass imbalance). Final dataset: 4,271 data points covering 297 unique reactions and 827 unique enzymes; targets were log10-transformed. The dataset was split into 80% training and 20% test such that identical enzyme sequences do not appear in both; fivefold CV splits ensured no sequence duplication across folds. Test subsets were created by stratifying by maximum sequence identity to training enzymes. Reaction representations: Three reaction fingerprint types were evaluated: (a) Structural reaction fingerprints: compute 1,638-bit molecular fingerprints for each substrate and product, apply bitwise OR across substrates and products separately, then concatenate to a 3,276-bit vector. (b) Difference reaction fingerprints: compute 2,048-bit atom-pair fingerprints for each reactant; sum all substrates and products separately and subtract product from substrate vectors to yield a 2,048-dimensional integer vector. (c) Differential reaction fingerprints (DRFP): directly produce a 2,048-bit binary fingerprint of the reaction by hashing substructures present exclusively in substrates or products from reaction SMILES. Enzyme representations: Transformer-based protein language model ESM-1b (trained on ~27 million UniRef50 sequences) provided 1,280-dimensional embeddings (sequence length truncated to 1,024 aa if needed). A previously fine-tuned task-specific ESM variant was considered but ESM-1b embeddings were selected as the main enzyme representation due to similar performance and broader availability. Modeling: Gradient boosting decision trees (XGBoost) were trained to predict log10(kcat). Inputs evaluated: reaction-only (each fingerprint type), enzyme-only (ESM-1b), and joint (concatenated ESM-1b + DRFP). Hyperparameters (learning rate, regularization α and λ, max depth, subsampling, n_estimators, min_child_weight, etc.) were optimized by random search with fivefold CV; final models were retrained on full training data and evaluated on the held-out test set using R², MSE, MAE, and Pearson r. Additional learners (linear regression, random forest, fully connected neural network) were trained for comparison. Generalization analyses: Test set was stratified by maximum enzyme sequence identity to any training enzyme (<40%, 40–80%, 80–99%, 99–100%) to assess generalization; a simple similarity baseline (geometric mean kcat of most similar enzymes) was compared. Reaction generalization was assessed by splitting test points into reactions seen vs unseen in training; for unseen reactions, similarity to training reactions (Tanimoto on structural reaction fingerprints) was related to performance. Additional features: Explored adding Michaelis constants (Km; curated from BRENDA where available, otherwise predicted with a GNN+UniRep-based model; geometric mean across substrates), Codon Adaptation Index (CAI; for E. coli only), and reaction fluxes (pFBA/FVA-based across multiple BiGG GEMs; mapping by reaction similarity) to the joint model. Statistical testing: One-sided Wilcoxon signed-rank tests compared absolute errors between model variants (e.g., DRFP vs other fingerprints; joint vs single-input). Mann–Whitney tests compared TurNuP vs DLKcat across sequence identity bins. Nested CV used for some feature-augmented models. Deployment: A web server (https://turnup.cs.hu-berlin.de) accepts enzyme sequences and reaction definitions (SMILES, KEGG IDs, or InChI), returns kcat predictions alongside maximum enzyme sequence identity and reaction similarity to training data.
Key Findings
- Reaction-only models are predictive: structural fingerprints R² ≈ 0.31 (MSE ≈ 0.99, r ≈ 0.56), difference fingerprints R² ≈ 0.34 (MSE ≈ 0.95, r ≈ 0.60), and DRFPs performed best with R² ≈ 0.38 (MSE ≈ 0.89, r ≈ 0.62). DRFPs significantly outperform difference fingerprints (p = 2.61e-4); vs structural fingerprints p = 0.064. - Enzyme-only models using ESM-1b embeddings also yielded reasonable performance (test R² ≈ 0.36–0.40, MSE ≈ 0.86–0.92, r ≈ 0.60–0.64). - Joint model (TurNuP: ESM-1b + DRFP) achieved the best accuracy: R² ≈ 0.40, MSE ≈ 0.81, r ≈ 0.67 on the test set; MAE on log10 scale ≈ 0.69 (~4.8-fold average deviation). Improvements over reaction-only and enzyme-only models were statistically significant (p = 0.004 and p = 1.2e-7, respectively). - Generalization by sequence similarity: Performance decreases with lower sequence identity to training enzymes; R² ≈ 0.67 for 99–100% identity and ≈ 0.33 for <40% identity. A simple similarity-based baseline performs poorly when no close homologs exist (R² ≈ 0.02 for <40% identity); across the full test set, TurNuP outperforms the baseline (R² ≈ 0.44 vs 0.24; N ≈ 851). - Generalization to unseen reactions: For reactions present in training, R² ≈ 0.57 (MSE ≈ 0.51, r ≈ 0.74); for unseen reactions, R² ≈ 0.35 (MSE ≈ 1.02, r ≈ 0.60). Within unseen reactions, higher similarity to training reactions correlates with better performance. - Comparison to DLKcat: TurNuP yields higher R² than DLKcat across all sequence-identity categories; differences in absolute errors are significant in most bins. Overall R² comparability between full test sets is affected by Simpson’s paradox due to differing composition (DLKcat test data dominated by near-identical sequences). - Utility in metabolic models: Using TurNuP-predicted kcat to parameterize enzyme-constrained GEMs improved proteome allocation predictions for 19/21 species–condition cases (p = 1e-4), reducing MSE by ~18% on average vs DLKcat-based parameterization. - Additional features: Adding Km and flux did not improve performance (test R² ≈ 0.39, MSE ≈ 0.87, r ≈ 0.63). CAI alone explained negligible variance (E. coli subset R² ≈ 0.012).
Discussion
TurNuP demonstrates that combining generalizable enzyme sequence embeddings from Transformer models with comprehensive reaction fingerprints enables robust, organism-independent prediction of kcat for wild-type enzymes and natural reactions. Reaction-only and enzyme-only models already capture substantial portions of the variance, indicating that both reaction chemistry and enzyme sequence encode overlapping determinants of catalytic rates. The joint model leverages complementary information and improves robustness, particularly when coverage by either modality is weak. Performance degrades as enzyme similarity to training data decreases and when reactions are dissimilar to training reactions, reflecting true generalization limits; nonetheless, TurNuP maintains useful accuracy even for <40% identity enzymes and for unseen reactions with moderate similarity. Compared to DLKcat, TurNuP generalizes better across diverse enzymes, likely due to Transformer-based protein representations and full-reaction encoding rather than a single-substrate view. Observed tendencies to overestimate very low kcat and underestimate very high kcat likely arise from regression dilution due to noisy features and targets. The data itself is noisy: independent measurements for the same enzyme–reaction pairs often differ substantially, constraining achievable accuracy. Despite this, TurNuP predictions approaching the typical variability of experimental measurements can already be valuable for guiding experiments and parameterizing genome-scale models, where they improve proteome allocation predictions. Adding estimated Km and model-derived fluxes did not improve performance, suggesting that their predictive signals may already be implicit in the sequence and reaction encodings or that measurement/estimation noise offsets benefits. More accurate, systematically measured condition-specific features (e.g., pH, temperature, metal ions), and larger, cleaner training datasets are likely to further improve predictive power. The web server provides practical access and includes similarity diagnostics to help users gauge prediction reliability.
Conclusion
TurNuP is an organism-independent kcat predictor that integrates Transformer-derived enzyme sequence embeddings with full-reaction fingerprints to achieve state-of-the-art performance and superior generalization to dissimilar enzymes and unseen reactions. It outperforms prior approaches, including DLKcat, and improves proteome allocation predictions when used to parameterize enzyme-constrained metabolic models. The model and web server enable broad application in enzymology and systems biology. Future work should incorporate larger, higher-quality datasets, condition-specific experimental metadata (e.g., pH/temperature), and potentially multi-task learning with related kinetic parameters (e.g., Km) to further enhance accuracy and interpretability.
Limitations
- Data noise and heterogeneity: substantial variability across kcat measurements for identical enzyme–reaction pairs limits achievable accuracy and induces regression dilution effects. - Limited experimental metadata: lack of standardized assay conditions (pH, temperature, ions) in databases prevents explicit conditioning of predictions on experimental context. - Generalization bounds: performance decreases for enzymes with low sequence identity to training data and for reactions with very low similarity to training reactions; predictions for a small subset of entirely dissimilar reactions are weak. - Dataset size and coverage: although broad, the final curated dataset (4,271 points) is modest for high-dimensional models; some enzyme classes (e.g., membrane-associated enzymes) are underrepresented. - Feature interpretability: embeddings and hashed fingerprints make it difficult to attribute importance to specific biochemical properties; additional interpretable features could aid mechanistic insight. - Additional features tested (Km, flux, CAI) provided little to no improvement likely due to noise or redundancy; better quality and condition-specific measurements may help.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny