logo
ResearchBunny Logo
Enabling late-stage drug diversification by high-throughput experimentation with geometric deep learning

Chemistry

Enabling late-stage drug diversification by high-throughput experimentation with geometric deep learning

D. F. Nippa, K. Atz, et al.

Discover how a groundbreaking platform that integrates geometric deep learning with high-throughput reaction screening revolutionizes late-stage functionalization in drug development. This innovative research, conducted by a team including David F. Nippa and Kenneth Atz, reveals powerful strategies for optimizing drug candidates through predictive modeling and enhanced reaction yields.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses a key challenge in medicinal chemistry: rapidly generating structure–activity relationships (SAR) for complex, drug-like molecules via late-stage functionalization (LSF). LSF can efficiently modify existing drug scaffolds but is hindered by diverse functional groups and multiple C–H bonds with varying steric and electronic environments, making reactivity and selectivity hard to predict and requiring resource-intensive experimentation. The authors propose integrating high-throughput experimentation (HTE) with geometric deep learning (GDL), focusing on iridium-catalyzed C–H borylation as a versatile gateway to broad diversification. The central objective is to develop and validate machine learning models that predict (1) binary reaction outcomes, (2) reaction yields, and (3) regioselectivity across drug-like substrates, thereby accelerating identification of late-stage modification opportunities and improving decision-making in medicinal chemistry.
Literature Review
The authors situate their work within several strands of prior research: (1) LSF methods (including fluorination, amination, arylation, methylation, trifluoromethylation, borylation, acylation, oxidation), with C–H borylation highlighted as particularly versatile for rapid diversification via downstream transformations. (2) HTE as a means of efficiently exploring reaction conditions and assembling FAIR datasets of both successful and failed reactions. (3) Extensive use of graph neural networks (GNNs) and related machine learning approaches (transformers, fingerprints) in chemical reaction prediction tasks such as retrosynthesis, regioselectivity, and product prediction. (4) Studies using transition-state (TS) information to predict reaction outcomes and selectivity, though often limited to small molecules and datasets. (5) Hybrid models incorporating quantum descriptors (e.g., DFT-level partial charges) to improve regioselectivity predictions, and work combining GNNs with HTE for reaction optimization. The authors note gaps: limited applications of LSF on complex drug-like molecules; unclear roles of sterics vs electronics in C–H activation models; and limited generalization to larger, multi-ring, or sp3-functionalization contexts. Their study aims to fill these gaps by integrating curated literature and new HTE data, exploring 2D vs 3D graph inputs and DFT partial charges, and quantifying the contributions of steric and electronic factors to prediction performance.
Methodology
Data curation and informer library: - Literature dataset: Systematic analysis (SACT) identified 38 publications on borylation; manual extraction yielded a curated FAIR dataset of 1,301 reactions. Filtering for yield prediction retained 492 reactions; for regioselectivity 656 reactions (criteria: duplicate product removal, B2Pin2-only reactions, yield ≥30%). - LSF informer library: Clustering of 1,174 approved small-molecule drugs produced eight structurally diverse clusters; three representative drugs per cluster (23 total) were selected based on diversity, availability, and cost. Additionally, 12 fragments were chosen from Roche’s collection using substructure and availability criteria, plus five simple substrates. Screening plate design and HTE: - Meta-analysis of literature informed a 24-well iridium-catalyzed borylation screen: [Ir(COD)(OMe)]2 catalyst, B2Pin2 boron source, six ligands across four classes, and four solvents. Standardized conditions: 80 °C, 16 h, 0.2 M, 100 μmol scale. - HTE execution: Automated solid dosing and solvent addition in a glovebox under N2; reaction heating/stirring in glass vials; solvent removal and automated resuspension; LCMS analysis and automated mixture deconvolution. Generated 956 experimental data points across the 23 drugs, 12 fragments, and 5 simple substrates. Data captured in SURF (simple user-friendly reaction format) for FAIR sharing. - Scale-up validations: Selected substrates (three drugs: Loratadine 1, warfarin 25, nevirapine 29; four fragments: 37, 38, 39, 45) were scaled and isolated; structures confirmed by NMR/HRMS, with boronic esters hydrolyzed where needed. Machine learning and model architectures: - Tasks: binary reaction outcome (success/failure), reaction yield (regression), and regioselectivity (site prediction at non-quaternary carbons). - Models: Two molecular-output models (GNN with sum pooling; GTNN with graph multiset transformer pooling) for outcome and yield; one atomic-output model (aGNN) for regioselectivity. Inputs explored: 2D, 2D with DFT partial charges (2DQM), 3D, and 3D with DFT partial charges (3DQM). 3D graphs used distance-based edges within 4 Å; 2D graphs used covalent bonds. - Featurization: Atom one-hot (12 atom types, ring/aromaticity/hybridization), optional on-the-fly DFT Mulliken charges (B97X-D/def2-SVP via DelFTa), and Fourier distance features for 3D. Reaction conditions (ligand, solvent, catalyst, reagent) one-hot encoded. - Conformers: Ten RDKit/UFF-minimized conformers per molecule; random conformer per training step; test predictions averaged over ten conformers. - Training: PyTorch Geometric/PyTorch on GPU; batch size 16; Adam (lr 1e-3), MSE loss; learning rate decay and early stopping. Approximately 2.0M parameters for GNN/aGNN and 3.0M for GTNN. - Baselines: ECFP4NN MLP and additional decision tree baselines (GB, XGBoost) for comparison. - Evaluation: Cross-validation procedures (including fourfold nested for yield); random and substrate-based splits for outcome classification; F-score for unbalanced atomic labels in regioselectivity. Confusion matrices and AUCs reported at multiple thresholds. Data and code availability: SURF-formatted literature (1,301 rxns) and experimental (956 rxns) datasets and templates, plus reference implementation (PyTorch) available at https://github.com/ETHmodlab/lsfml (Zenodo: 8118845).
Key Findings
- Data generation and integration: - Curated literature dataset: 1,301 borylation reactions from 38 publications; experimental HTE dataset: 956 reactions from 23 drugs, 12 fragments, and 5 simple substrates. - SURF format enabled FAIR data capture and seamless ML integration. - Reaction yield prediction: - Best model (GTNN3DQM) on experimental dataset achieved m.a.e. 4.23 ± 0.08% with Pearson r = 0.890 ± 0.01 (n = 239 test points). On literature data, GTNN2DQM m.a.e. = 16.11 ± 0.02%, r = 0.61 ± 0.01. - Binary reaction outcome (success/failure): - Random split AUCs at thresholds: 1% 94.5 ± 0.2%; 5% 94.5 ± 0.2%; 10% 95.6 ± 0.3%; 20% 94.4 ± 0.2%. - Substrate-based split (novel substrates): GTNN3DQM AUC = 67 ± 2%; accuracy >50% for 20/23 unseen drugs and >80% for 16/23. - 3D GNNs (58–67% AUC) outperformed 2D GNNs (51–59%) and ECFP4NN (52%) on substrate-based splits. - Regioselectivity prediction (non-quaternary carbons): - Best model aGNN3DQM: F-score 60 ± 4% on literature test set; accuracy 90 ± 1%, PPV 62 ± 2%, TPR 59 ± 6% over 1,259 atoms. - 3D graphs markedly improved F-score vs best 2D model (60% vs 39%). - Prospective validations on drugs (1, 25, 29) and fragments (37, 38, 39) showed approximately 70% accuracy; five of seven observed sites correctly predicted; several unseen reactivity cases accurately captured; one sp3 borylation (fragment 39) highlighted limitations. - Role of steric vs electronic information: - Incorporating 3D steric information improved all tasks (yield MAE 4.2% vs 4.4%; outcome AUC 67% vs 59%; regioselectivity F-score 60% vs 39%). DFT partial charges did not improve performance, consistent with borylation being predominantly sterically controlled. - HTE insights (tolerance and conditions): - Functional group tolerance: aromatic nitrogens, aryl alkoxy, and alcohols favored; primary amines, carbamates/carbonates, and strongly electron-withdrawing aryls (e.g., nitro) disfavored. - Best ligand performance: ligand 9 (33% success), ligands 6–8 comparable (28–30%), ligand 5 (22%), ligand 4 (17%). - Solvents: cyclohexane (50%) > Me-THF (43%) > CPME (38%) > MeCN (29%). - Practical outcomes: - For six unseen substrates predicted positive, main products were isolated in 5–90% yields; platform identified numerous diversification opportunities across 23 drugs.
Discussion
The integrated HTE–GDL platform directly addresses the challenge of predicting reactivity, yield, and site selectivity for late-stage borylation of complex drug-like molecules. By combining a standardized high-quality experimental dataset with a curated literature corpus in a FAIR format, the models achieved high accuracy on random splits and reasonable generalization to unseen substrates. Incorporating 3D structural information proved critical, quantifiably improving predictions across tasks and underscoring the dominant role of steric effects in iridium-catalyzed C–H borylation. The absence of consistent gains from DFT partial charges aligns with mechanistic expectations for sterically driven reactions. Prospective validations demonstrated that the platform can guide experimental planning: it correctly identified borylation opportunities across diverse drugs and fragments, with isolated yields consistent with predictions. The platform also offered interpretable trends in functional group tolerance and condition effects (ligands, solvents), supporting its practical deployment in medicinal chemistry to prioritize late-stage diversification pathways and reduce experimental burden. However, generalization is constrained by the available training data and its biases (predominantly sp2 borylations and smaller/mid-size substrates in the literature). The substrate-based AUC of 67% indicates moderate performance for wholly novel scaffolds, suggesting the need for continued HTE expansion and data standardization to broaden chemical space coverage and enhance reliability for larger or more complex molecules, including sp3 borylation sites.
Conclusion
This work introduces a practical, data-driven platform that integrates high-throughput borylation screening with geometric deep learning to enable late-stage diversification of drug molecules. The approach delivers accurate predictions of reaction yield, binary outcome, and regioselectivity, with 3D structural information providing substantial gains, especially for site selectivity. The SURF data format facilitates FAIR data capture and seamless ML integration, and prospective validations confirm the platform’s utility in identifying LSF opportunities and guiding synthesis. Future directions include: (1) expanding reaction condition panels (alternative catalysts, boron sources, broader ligand/solvent sets) to enhance optimization; (2) augmenting the LSF informer library and systematically growing the training datasets to cover larger, more complex, and sp3-rich chemotypes; and (3) continued standardization and automation to improve data quality and model generalization for broader late-stage functionalization chemistries.
Limitations
- Data bias and scope: Literature data emphasize iridium-catalyzed borylations at sp2 carbons and substrates with limited ring systems, limiting generalization to larger, more complex, or sp3-rich molecules; regioselectivity for out-of-distribution cases (e.g., sp3 borylation) is less reliable. - Dataset heterogeneity: Literature yields are determined by varied methods (isolated yield, NMR conversion, LCMS), reducing consistency relative to standardized HTE data. - Moderate generalization: Substrate-based AUC for binary outcomes (67 ± 2%) indicates only moderate performance on unseen scaffolds. - Electronic descriptors: Incorporation of DFT-level partial charges did not improve performance, suggesting current models may not fully capture electronically driven cases. - Model limitations: Regioselectivity remains challenging with unbalanced labels; F-scores around 60% indicate room for improvement, and certain multi-site or di-borylation cases remain difficult to predict precisely.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny