logo
ResearchBunny Logo
Introduction
Late-stage functionalization (LSF) is a crucial strategy in drug discovery, allowing for the modification of drug candidates without requiring a complete resynthesis. This approach enhances properties like absorption, distribution, metabolism, and excretion (ADME) while minimizing costs. However, predicting reactivity for C-H activation reactions in complex molecules is challenging. Minisci-type alkylations, a valuable LSF method, incorporate alkyl building blocks into heterocyclic systems, the core of many drugs. They use readily accessible carboxylic acids, broadening applicability, especially with the increasing emphasis on sp3-rich building blocks in pharmaceuticals. The challenge is the difficulty in predicting reactivity due to the variety of C-H bonds and electronic effects in complex molecules. Traditional methods of individual reaction testing are resource-intensive. This research addresses this challenge by integrating high-throughput experimentation (HTE) with machine learning to create a computational framework for predicting reactivity and guiding the synthesis of novel molecules.
Literature Review
The literature extensively covers various experimental late-stage functionalization (LSF) methods. Alkylation reactions, particularly Minisci-type alkylations, have emerged as significant tools, enabling the incorporation of alkyl groups into heterocyclic systems. Minisci reactions involve the use of ammonium persulfate as the oxidant and silver nitrate as the catalyst to generate alkyl radicals from carboxylic acids at elevated temperatures. The scope of both electron-deficient heteroarenes and alkyl-donating coupling partners has been steadily expanded, encompassing various radical sources like alkyl carboxylic acids, boronic acids, and sulfinates. While guidelines exist for predicting reactivity in Minisci-type transformations, the complexities of diverse C-H bonds and electronic effects in complex drug molecules present a significant challenge. High-throughput experimentation (HTE) has provided a systematic approach for exploring and optimizing chemical transformations, enabling miniaturization of reactions and high-throughput analysis. Graph neural networks (GNNs), effective in learning on 3D molecular models, have demonstrated applications in predicting reaction outcomes, regioselectivity, and yield in various chemical reactions. This study builds upon these advancements to develop a combined HTE and machine learning approach for efficient LSF.
Methodology
This study employed a two-pronged approach combining high-throughput experimentation (HTE) and machine learning using graph neural networks (GNNs). First, Minisci-type reactions were miniaturized to the nanomolar scale (500 nmol) using a 24-well plate. Reaction optimization involved determining optimal temperature (40 °C) and reagent concentrations (20 equivalents of carboxylic acid and 6 equivalents of oxidant). A total of 23 diverse alkyl groups, primarily sp3 ring systems, were tested in combination with various electron-deficient heterocycles. Reaction outcomes were classified as successful (mono- or di-alkylation product with ≥5% yield by LCMS) or unsuccessful. The data from this HTE screening, along with literature data and decoy data (representing unsuccessful reactions), were used to train GNN models. The GNN models were designed to take into account 3D structural information and were trained using the Simple User-Friendly Reaction Format (SURF). Six independent GNNs were trained—three for binary reaction outcome prediction and three for reaction yield prediction. These trained models were then used for in silico screening of a Roche internal library of 3180 advanced heterocyclic building blocks. Molecules were clustered to ensure structural diversity, and those predicted to be reactive were selected for experimental validation. Selected candidates underwent automated HTE screening, and successful reactions were scaled up to the milligram level for synthesis and characterization by NMR and HRMS. A total of 30 novel molecules were synthesized and characterized. The GNN models were validated using a random data split, evaluating the accuracy of yield prediction (MAE of 18.7%) and binary outcome prediction (accuracy of 81%).
Key Findings
The nanomolar-scale HTE successfully miniaturized Minisci-type reactions, providing a platform for high-throughput screening. The use of GNNs trained on a balanced dataset of successful and unsuccessful reactions resulted in a highly accurate predictive model for identifying suitable substrates for Minisci-type alkylations. In silico screening of a 3180-compound library using these GNNs identified 18 promising candidates (94% success rate). HTE screening of these candidates resulted in a total of 276 successful reactions. The upscaling of chosen reactions led to the synthesis and characterization of 30 novel molecules, incorporating diverse alkyl substituents into drug molecules (e.g., Loratadine, Nevirapine) and fragments. Analysis of the results revealed reactivity trends, with cyclic ethers and alkanes generally performing well, while cyclic Boc-protected amines and amides yielding lower yields. Meta-unsubstituted pyridines exhibited higher yields compared to their meta-substituted counterparts. The study further demonstrated the importance of including decoy data (representing unsuccessful reactions) in the training dataset to improve the accuracy of the predictive model. The use of the SURF format streamlined the process and facilitated the transition between in silico screening and experimental validation.
Discussion
This study successfully demonstrated the power of combining HTE and GNN-based machine learning for accelerating late-stage functionalization in drug discovery. The ability to predict reaction outcomes with high accuracy allows for a significant reduction in experimental effort and cost. The high success rate in identifying suitable substrates (94%) and the synthesis of 30 novel molecules showcase the efficiency of this integrated approach. The identified reactivity trends provide valuable insights into the design of future experiments and can guide the selection of substrates for Minisci-type alkylations. The use of a balanced dataset, including both successful and unsuccessful reactions, was crucial for building a reliable predictive model, highlighting the importance of incorporating negative data in machine learning for chemistry. This integrated approach is currently being employed at Roche, demonstrating its applicability and value in a real-world industrial setting.
Conclusion
This research effectively demonstrates a novel methodology integrating high-throughput experimentation and machine learning to accelerate late-stage C-H alkylation. The high predictive accuracy of the GNN model, coupled with efficient HTE, significantly streamlines the synthesis of novel drug analogs. The generated 30 novel molecules validate this approach, highlighting its potential for broader application in drug discovery. Future research should focus on expanding the scope of the reaction conditions, including different oxidants, solvents, and catalysts, and on exploring diverse alkyl radical precursors and heterocyclic systems. The model's accuracy can be further enhanced by continuously incorporating new LSF reaction data.
Limitations
The study primarily focused on Minisci-type alkylations and a specific set of substrates. The generalizability of the GNN model to other reaction types or substrate classes needs further investigation. Although the model's accuracy is high, there will always be some degree of uncertainty in predicting reaction outcomes, especially for complex molecules. The availability of suitable starting materials and the scalability of synthesis might also limit the applicability of this method in certain cases. Further exploration of the model's performance with diverse substrate classes and reaction types is needed to fully assess its scope.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny