Chemistry

Unraveling the energetic significance of chemical events in enzyme catalysis via machine-learning based regression approach

Z. Song, H. Zhou, et al.

Discover how Zilin Song, Hongyu Zhou, Hao Tian, Xinlei Wang, and Peng Tao leverage advanced hybrid quantum mechanical molecular mechanical techniques to unravel the intricate mechanisms of bacterial β-lactamases and their role in antibiotic resistance. This research unveils predictive energy models and rate-limiting steps, offering insights that align with empirical studies.... show more

Introduction

Bacterial resistance to β-lactam antibiotics presents a global health threat, driven in part by β-lactamases that hydrolyze β-lactam substrates. These enzymes are classified into four classes (A–D), with classes A, C, and D being serine β-lactamases and class B being zinc-based. Class A enzymes, including TEM-1, are prevalent and confer resistance to a broad substrate range. Experimental studies have highlighted residues at the catalytic site, supporting a mechanism in which Glu166 acts as a general base during acylation: Ser70 attacks the β-lactam carbonyl carbon to form a tetrahedral intermediate while proton transfers occur via a catalytic water, followed by protonation of the β-lactam nitrogen (assisted by Lys73/Ser130) leading to scissile bond cleavage. Additional residues such as Asn170 and Ser235 contribute to Michaelis complex formation via hydrogen bonding. Computational methods, particularly QM/MM and MD, have further characterized the acylation mechanism and provided potential energy surface (PES) pathways. However, single-path analyses under fixed MM potentials may not represent the full mechanism due to high configurational freedom. Chain-of-states (CoS) methods can generate minimum energy pathways (MEPs) under different MM fields, but analysis is complicated by correlated geometric degrees of freedom. Machine-learning regression can model high-dimensional relationships between structural descriptors and energies, offering a way to quantify energetic contributions of chemical events. This study aims to develop model-independent criteria to dissect energetic contributions and correlations in enzyme catalysis using TEM-1/benzylpenicillin acylation as a model.

Literature Review

Prior experimental work established key catalytic roles for Ser70, Glu166 (as a general base), Lys73, Ser130, Asn170, and Ser235 in TEM-1 β-lactamase. Structural studies suggested Glu166’s role in acylation and mapped hydrogen-bonding networks shaping substrate binding. Computational studies using QM/MM and MD validated the acylation mechanism, though reports differed on whether the tetrahedral intermediate is more or less stable than the reactant and on barrier heights. Earlier QM/MM works (e.g., AM1, B3LYP, MP2 levels) reported different barrier profiles and sometimes identified tetrahedral formation as rate limiting. Free energy simulations on related systems provided accurate barriers but are costly. Machine learning has been used in protein allostery, drug discovery, and to accelerate QM/MM, yet generalized approaches to quantify feature importance in nonlinear models remain limited. This motivates employing ML regression to build predictive PES models from structural descriptors and devising model-independent metrics for energy contribution analysis across multiple MEPs sampled under varied MM environments.

Methodology

System setup and QM/MM: The TEM-1/benzylpenicillin acyl-enzyme product was taken from PDB 1fqg, modifying Asn166 to Glu166 to reflect wild type. Protonation states followed prior studies. The system was solvated in a 77 Å cubic water box with ions to neutralize charge. Classical minimization and equilibration used CHARMM36 for protein, CGenFF for penicillin, and TIP3P for water. A 10 ns, 300 K MD run provided the initial QM/MM pathway structure. Software included CHARMM interfaced with SCC-DFTB or Q-Chem 5.0; MD with OpenMM 7.3.1. CoS pathway generation and sampling: An initial DFTB3/mio:CHARMM CoS pathway (RPATh with constraints) was computed with 50 replicas. Three representative replicas (reactant r, transition t, product p) were each subjected to 200 ns MD with QM atoms fixed and MM atoms mobile; snapshots were stored every 0.1 ps. Pairwise Cα distances underwent 2D-PCA; agglomerative clustering into six clusters per state yielded six representative conformations per state (total 18). A common MM region was defined as the union of residues within 10 Å of all representative QM regions. QM/MM geometry optimizations were performed on the 18 structures, followed by 18 constrained RPATh CoS optimizations to obtain MEPs (50 replicas each). QM levels of theory: Geometry/path optimizations used DFTB3/mio:CHARMM and B3LYP/6-31G:CHARMM. Single-point energy refinements on B3LYP-optimized paths employed B3LYP with 6-31+G*, 6-31++G**, 6-311++G** basis sets, and dispersion-corrected B3LYP-D3 with 6-31++G** and 6-311++G**. In total, seven QM levels were analyzed. Machine learning regression: Predictive PES models mapped structural descriptors to relative energies (relative to reactant). Features were relative changes of 105 interatomic distances among bonded or hydrogen-bonded atom pairs in the 92-atom QM region. Regression models included support vector regression (SVR), Gaussian process regression (GPR), and kernel ridge regression (KRR), all with RBF kernels. Hydrogen bonds were identified by Baker-Hubbard criteria (MDTraj 1.9.3). Training/validation employed leave-one-group-out cross-validation (group = pathway), with grid search hyperparameter tuning; for GPR, noise level α in a white kernel was tuned to prevent overfitting. Model performance was evaluated by RMSE across replicas of the held-out pathway. Dimensionality reduction of feature distributions used 2D t-SNE to assess sampling coverage. Energy contribution analyses: Intrinsic energy contribution measured the joint importance of feature subsets (grouped by chemical events such as proton transfers P0–P3, bond formation/cleavage B0–B1, hydrogen bonds H0–H2) by the increase in RMSE when the subset was zeroed during training. Dynamic energy contribution quantified local, reaction-progress-dependent contributions via finite differences of model outputs under small, sign-aligned perturbations of features, scaled by a weighting factor and a local diagonal correlation matrix to account for variable domains and gradient directions. Reaction coordinates were normalized with anchors at reactant (0.0), tetrahedral intermediate (1.0), and product (2.0).

Key Findings

All computed pathways exhibit two barriers separated by a tetrahedral intermediate. B3LYP-based pathways consistently place the tetrahedral intermediate at higher energy than the reactant, whereas most DFTB3 pathways predict the intermediate lower than reactant and with distinct protonation states (e.g., hydronium and deprotonated Glu166 vs. neutral water and protonated Glu166 reported elsewhere).
Geometry differences at DFTB3 include a larger Ser70 Oγ–carbonyl C distance in the intermediate (average 2.1 Å) compared to prior B3LYP reports (~1.45 Å), suggesting DFTB3 configurational changes along CoS are less reliable despite reasonable barrier magnitudes.
Barrier statistics (Table 1, average over 18 paths; kcal/mol): • DFTB3/mio:CHARMM: MC→TI 3.6(0.3), TI→AE 11.4(0.1), overall 11.4(0.1) • B3LYP/6-31G: 4.4(0.5), 3.3(0.1), overall 7.1(0.5) • B3LYP/6-31+G*: 7.9(0.7), 3.8(0.2), overall 10.9(0.4) • B3LYP/6-31++G**: 8.6(0.9), 3.8(0.7), overall 11.9(0.4) • B3LYP-D3/6-31++G**: 5.7(0.8), 3.4(0.3), overall 8.0(0.3) • B3LYP/6-311++G**: 9.1(0.3), 3.9(0.7), overall 12.7(0.3) • B3LYP-D3/6-311++G**: 6.2(0.4), 3.6(0.8), overall 9.0(0.1) Experimental overall barriers (from kcat): ~12.6–13.0 kcal/mol. The B3LYP/6-311++G** single-point energies yield overall barriers closest to experiment but with larger pathway-to-pathway deviations; CoS B3LYP/6-31++G** barriers also show strong agreement.
B3LYP/6-31++G** pathways indicate the tetrahedral intermediate is metastable, on average 0.6(0.1) kcal/mol below the preceding transition state; the tetrahedral collapse barrier averages 3.8(0.7) kcal/mol, supporting a concerted, effectively one-step acylation with coupled proton transfers and bond changes.
ML predictive PES performance: For B3LYP pathways, RMSE < 2.0 kcal/mol across SVR, GPR, and KRR; DFTB3 pathways show worse performance due to broader configurational diversity and under-sampled regions (e.g., pathway 16/p underfitted). t-SNE analyses confirm broader variable space for DFTB3, implying need for additional sampling.
Intrinsic energy contributions (model-independent ranking across QM levels and ML models): P2 (protonation of Ser130) and P3 (protonation of β-lactam nitrogen) are dominant; BO/B1 (Ser70–C=O bond formation and β-lactam scissile bond cleavage) have moderate contributions; P0/P1 and hydrogen bonds (H0–H2) contribute least. GPR provided the most stable rankings.
Dynamic energy contributions along the reaction coordinate show: during formation of the tetrahedral intermediate, BO dominates and P3 is concerted; during product formation, P2 and P3 with B1 are rate-determining. Both intrinsic and dynamic analyses identify the same critical events.
Mechanistic implications and experimental consistency: The acylation proceeds via a one-step four-proton-transfer mechanism with coupled events. Lys73 (proton source) and Ser130 (proton bridge) are essential; Glu166 proton acceptance (P1) is less critical, consistent with mutagenesis: Lys73 mutation deactivates acylation; Ser130Gly remains active via a water substitute with reduced rate; Glu166 mutations can yield PBP-like behavior while acylation remains thermodynamically favorable.

Discussion

The study addresses how individual chemical events contribute to the energetics and rate limitation of TEM-1 acylation under realistic, high-dimensional enzyme environments. By training ML regression models on ensembles of QM/MM MEPs and introducing intrinsic and dynamic contribution metrics that are model-independent, the work quantifies which proton transfers and bond changes drive barriers and where along the reaction progress they act. The findings indicate a concerted acylation wherein bond formation (Ser70 Oγ–carbonyl C) and protonation of the β-lactam nitrogen occur together, and product formation is limited by coupled proton transfers (Lys73→Ser130→β-lactam N) plus scissile bond cleavage. This revises earlier attributions that singled out tetrahedral formation as the sole rate-limiting step, and aligns with mutagenesis experiments implicating Lys73 and Ser130 as crucial. The analysis is robust across seven QM levels and three ML models, with predictive accuracies within ~2 kcal/mol for B3LYP data. The approach offers a transferable framework for dissecting energetics in complex enzymatic reactions, partially capturing entropic effects by aggregating multiple MEPs and reflecting pathway heterogeneity.

Conclusion

This work presents a machine-learning assisted framework to build predictive PES models from QM/MM reaction pathways and introduces two model-independent metrics—intrinsic and dynamic energy contributions—to quantify the energetic roles and correlations of specific chemical events during enzyme catalysis. Applied to TEM-1/benzylpenicillin acylation, the approach identifies P2/P3 proton transfers and associated bond changes as the key rate-determining processes and supports a concerted four-proton-transfer mechanism, consistent with experimental mutagenesis. The methodology is consistent across multiple QM levels and nonlinear regressors and demonstrates qualitative agreement with experimental observations. Future directions include expanding sampling to better capture configurational diversity and entropic effects, extending to more complex systems (e.g., transition-metal enzymes) with adapted descriptors, and leveraging enhanced sampling and advanced ML (e.g., neural-network PES) to further improve accuracy and interpretability.

Limitations

DFTB3/mio optimized pathways show unreliable configurational changes and under-sampled regions, degrading ML performance; broader sampling is required for comparable accuracy to B3LYP.
Chain-of-states pathways can underestimate barriers relative to free energy methods; the study focuses on potential energy barriers, not full free energies.
Entropic contributions are only partially represented via multiple MEPs; protein flexibility and other entropic effects are not fully captured.
Dynamic contribution metrics are model-dependent and not directly experimentally verifiable; their magnitudes depend on feature scaling and local gradients.
Results vary with QM level; single-point refinements improve barrier magnitudes but may increase variability across pathways.
Transfer to complex systems (e.g., transition-metal centers) may require substantial methodological development and feature redesign.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Machine learning-based prediction of in-hospital death for patients with takotsubo syndrome: The InterTAK-ML model

O. D. Filippo, V. L. Cammann, et al.

Engineering and Technology

Machine Learning Techniques for the Performance Enhancement of Multiple Classifiers in the Detection of Cardiovascular Disease from PPG Signals

S. W. Rabkin, A. Cataldo, et al.

Business

Applying machine learning algorithms to predict the stock price trend in the stock market - The case of Vietnam

T. Phuoc, P. T. K. Anh, et al.

Medicine and Health

Recent Advancements and Perspectives in the Diagnosis of Skin Diseases Using Machine Learning and Deep Learning: A Review

J. Zhang, F. Zhong, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny