logo
ResearchBunny Logo
Introduction
Computational methods are crucial for efficient compound prioritization in drug discovery, with relative binding free energy (RBFE) calculations considered the gold standard for accurately predicting binding affinities. However, these calculations are complex, computationally expensive, and require significant expert time. Existing workflows, such as those employing equilibrium free energy perturbation (FEP) or non-equilibrium switching (NES), often necessitate manual setup, execution, and evaluation, limiting wider adoption. This research addresses this limitation by developing a fully automated end-to-end workflow that streamlines the process of RBFE calculation, starting from SMILES string representation of ligands to the final binding free energy estimate. The workflow leverages the Icolos open-source workflow manager, enabling flexible combination of commercial and open-source tools. The importance lies in its potential to democratize access to RBFE calculations, allowing for broader application in drug discovery beyond large industrial settings where commercial software is readily available. The automation significantly reduces the reliance on expert knowledge and time investment, accelerating the drug discovery process.
Literature Review
Several methods exist for RBFE calculations, including equilibrium free energy perturbation (FEP) and non-equilibrium switching (NES). FEP, popularized by Schrödinger's FEP+, uses discrete lambda windows for stepwise ligand transformation. NES, used in this work, performs equilibrium simulations for each end state and then multiple short non-equilibrium transitions to calculate free energy changes. While both methods allow for scaling to large datasets, they require manual intervention. Previous work primarily relied on hand-modeled ligand poses, which limits the applicability of those benchmarks to fully automated workflows. Recent studies have highlighted the impact of ligand pose quality on the accuracy of RBFE predictions, emphasizing the need for reliable automated docking protocols for large-scale applications within active learning or de novo design.
Methodology
The automated workflow, implemented using Icolos, comprises ligand embedding (using LigPrep or RDKit), docking (Glide or AutoDock Vina), optional MCS filtering (for Vina), perturbation map generation (Schrödinger's fep_mapper.py or LOMAP), PMX setup and atom mapping, system assembly and solvation, system equilibration, end-state simulations (NPT), transition simulations (TI), and finally, ΔΔG calculation and analysis. Different docking protocols were tested, including 'Vanilla' Glide and Vina, Glide with core constraints, Glide with maximum common substructure (MCS) constraints, and Vina with post-hoc MCS filtering. The PMX package, combined with GROMACS, was used for topology generation and molecular dynamics simulations. The non-equilibrium switching approach, employing thermodynamic integration, was used to calculate free energy changes. Simulations utilized the AMBERff99sb-ILDN force field for the protein and GAFF2 parameters with AM1-BCC partial charges for ligands. A detailed simulation protocol, including parameters such as timestep, temperature, pressure control, and long-range interaction treatment, was implemented and is fully described. Four protein-ligand systems (P38α, PTP1B, TNKS2, and SYK) were used to assess the workflow, drawing upon previously published data sets to allow for comparison with existing RBFE calculations and experimentally determined binding affinities. The workflow was also benchmarked on the Amazon Web Services (AWS) cloud, employing a ParallelCluster and Icolos' SLURM interface to optimize cost-efficiency. A fault-tolerant capability using spot-allocated instances was integrated to further enhance the scalability and cost-effectiveness of the cloud deployment.
Key Findings
The workflow successfully performed RBFE calculations for 1005 alchemical perturbations across four systems. The study examined the impact of docking protocol on RBFE prediction accuracy. Results showed that both open-source (AutoDock Vina) and commercial (Glide) docking methods yielded good correlation with experimental data, indicating the feasibility of an entirely open-source workflow. While manual pose adjustments often improved accuracy, particularly for P38α and PTP1B, the automated docking protocols produced comparable results for TNKS2 and SYK. The performance of different docking protocols varied across systems; for instance, MCS-constrained Glide performed well for P38α, while MCS-filtered Vina excelled for PTP1B. Overall, the use of core or MCS constraints in Glide did not consistently improve accuracy, and MCS filtering for Vina showed system-dependent effects. The study revealed that consistency in pose prediction across different docking algorithms does not always guarantee accurate affinity estimation, as demonstrated by specific examples. The AWS cloud deployment demonstrated the workflow's scalability and cost-effectiveness, with a single free energy estimate requiring approximately 10 hours of wall-clock time at a cost of $12-15 per ΔΔG value. A structure-activity relationship (SAR) analysis for TNKS2, based on Glide MCS poses, showcased the ability to derive meaningful insights from accurate predictions, illustrating the workflow's potential in lead optimization. However, the accuracy of the SAR analysis hinges on the accuracy of the predicted binding affinities. In some cases, inaccurate binding affinity predictions due to improper pose prediction will result in unreliable SAR conclusions.
Discussion
The findings demonstrate the successful development and validation of a fully automated, end-to-end workflow for calculating relative binding free energies from SMILES strings. The use of open-source tools significantly reduces the barrier to entry for researchers lacking access to commercial software. The results highlight the importance of appropriate docking protocols in achieving accurate RBFE predictions, demonstrating that, while manual pose refinement can improve accuracy, well-chosen automated docking strategies can provide results comparable to those obtained through expert-driven approaches. This research significantly advances the application of RBFE calculations in drug discovery, enabling more efficient and accessible computational lead optimization and structure-activity relationship analysis.
Conclusion
This work presents a framework for automated relative binding free energy calculations, bridging the gap between SMILES input and ΔΔG output. The use of Icolos provides flexibility and scalability, while the exploration of various docking methods demonstrates the viability of open-source alternatives. Future work could focus on further refining the workflow's automation, exploring enhanced sampling techniques, and integrating it into more complex pipelines for active learning and de novo drug design.
Limitations
The accuracy of RBFE predictions remains dependent on several factors beyond the scope of this study, including force field parameters, the quality of input protein structures, and the completeness of sampling. The reliance on pre-existing datasets for validation limits the generalizability of the findings to entirely novel systems, and the absolute binding free energies are reconstructed from relative ΔΔG estimates, introducing potential errors.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny