Chemistry

Deep Kernel Learning for Reaction Outcome Prediction and Optimization

S. Singh and J. M. Hernández-lobato

Discover an innovative deep kernel learning model developed by Sukriti Singh and José Miguel Hernández-Lobato that predicts chemical reaction outcomes with remarkable precision. This cutting-edge approach combines the power of neural networks and Gaussian processes, offering not just accurate predictions but also valuable uncertainty estimates, making it an exciting advancement in optimizing reaction conditions.

00:00

~3 min • Beginner • English

Index

Introduction

Chemical reaction optimization is central to organic synthesis, where the aim is to maximize reaction outcomes (e.g., yield, enantiomeric excess) by selecting optimal conditions across a high-dimensional space of variables (catalyst, solvent, substrate, additive, time, temperature, concentration). Machine learning has shown promise for predicting reaction outcomes to triage low-yield reactions prior to experiments, historically leveraging hand-crafted features (physical organic descriptors, molecular fingerprints) with conventional models (e.g., random forests). Recent deep learning approaches learn molecular representations directly from SMILES strings or molecular graphs using language models and graph neural networks. However, incorporating principled uncertainty estimates—critical for tasks like Bayesian optimization—remains challenging for many ML models. GPs provide calibrated uncertainty but struggle to learn task-specific representations from raw molecular inputs; their kernels are typically fixed and not adapted to complex encodings like SMILES or graphs. Deep kernel learning (DKL) addresses this by combining NN-based representation learning with GP uncertainty. Prior work often tailored models to either learned or nonlearned representations, motivating a unified approach. This study investigates DKL for reaction yield prediction across both representation types and demonstrates its use as a BO surrogate for reaction optimization.

Literature Review

Prior reaction yield prediction has used engineered molecular descriptors and fingerprints with conventional ML (e.g., random forests), and learned representations (SMILES, molecular graphs) with deep models such as transformers and GNNs. Learned representations often outperform engineered ones when data is abundant, while engineered features can be strong in low-data regimes. Uncertainty quantification has been addressed more naturally by Gaussian processes, which are widely used as BO surrogates, but standard GPs cannot learn representations from raw molecular encodings. Deep kernel learning integrates NNs and GPs to provide both flexible representation learning and calibrated uncertainty, with early applications in chemistry and materials modeling. Previous reaction outcome models generally aligned model choice with representation type; a model effective across learned and nonlearned inputs with uncertainty estimates can broaden applicability in reaction development and optimization.

Methodology

Dataset and task: Buchwald–Hartwig C–N cross-coupling with 3,955 reactions formed by all combinations of 15 aryl halides, 4 ligands, 3 bases, and 23 additives; target is isolated yield. Yields are standardized (zero mean, unit variance over training data). Representations: Nonlearned inputs: (1) Molecular descriptors (120 DFT-derived descriptors capturing electronic/spatial properties; reaction vector formed by concatenating features of all reactants), (2) Morgan fingerprints (radius 2, 512-bit per reactant; reaction vector by concatenation to 2048 bits), (3) DRFP (2048-bit reaction fingerprint from reaction SMILES). Learned input: molecular graphs built with RDKit; atoms as nodes and bonds as edges with rich node/edge feature sets (atom type, hybridization, chirality, formal charge; bond type, conjugation, ring membership, stereochemistry). DKL architecture (nonlearned inputs): Feed-forward NN (two fully connected layers with dropout 0.1) maps the reaction vector to an embedding; a GP with Matérn 5/2 kernel (no ARD) takes the embedding as input. Joint training maximizes GP log marginal likelihood via backpropagation. DKL architecture (learned inputs): For each reactant, a message passing neural network (edge network message function, GRU update) produces node embeddings; a set2set readout performs global pooling to yield a reactant graph embedding. Reaction embedding is the sum of component embeddings (aryl halide, ligand, base, additive), ensuring permutation invariance of reactants. This reaction embedding is passed through a small FFNN, whose output feeds a GP with Matérn 5/2 kernel. Parameters of GNN, FFNN, and GP hyperparameters are learned jointly by maximizing the GP log marginal likelihood. Baselines: Standard GP with Matérn 5/2 kernel (no ARD), trained with L-BFGS-B, using the same nonlearned representations (molecular descriptors, Morgan FP, DRFP). A stand-alone GNN (identical to the GNN component used in DKL) is also evaluated. Training details: Implemented in PyTorch/GPyTorch. Adam optimizer with learning rate 0.001; 400 epochs. Ten independent random 70:10:20 train/validation/test splits for model selection and final evaluation; metrics averaged over 10 runs with standard error reported. Performance metrics: RMSE, MAE, R-squared; uncertainty quality assessed via Negative Log Predictive Density (NLPD). Additional analyses include varying train/test splits (80:20, 70:30, 50:50, 30:70, 20:80, 10:90, 5:95), out-of-sample additive splits, fingerprint length sensitivity, and latent space visualization with UMAP and k-means clustering. Bayesian optimization (BO): DKL used as surrogate with MorganFP input due to strong low-data performance. BO loop: initialize with 5% data; held-out 95% as candidate pool; run 20 iterations; evaluate acquisition function over the held-out set; select argmax each iteration. Acquisition: Expected Improvement (EI). BO performance averaged over 50 random initializations when comparing DKL vs GP vs random; acquisition comparison (EI vs greedy mean-only) averaged over 20 runs. BO also evaluated with GNN-DKL using graph inputs.

Key Findings

- Predictive accuracy (RMSE, mean ± SE, lower is better): • Standard GP baselines: MolDesc-GP 8.58 ± 0.06; MorganFP-GP 6.39 ± 0.06; DRFP-GP 6.46 ± 0.05. • DKL (nonlearned representations): MolDesc-DKL 4.87 ± 0.07; MorganFP-DKL 4.86 ± 0.08; DRFP-DKL 4.87 ± 0.11. • DKL with graphs (GNN-DKL): 4.80 ± 0.19. • Stand-alone GNN: 4.89 ± 0.19. DKL substantially improves over standard GPs across representations and achieves accuracy comparable to a stand-alone GNN while providing uncertainty estimates. - R-squared trends mirror RMSE: DKL consistently outperforms standard GPs and is comparable to GNNs. - Uncertainty quality (NLPD, mean ± SE, lower is better, 80:20 split): Graph: GNN 3.750 ± 0.006; DKL −0.209 ± 0.075. Molecular descriptor: GP 0.235 ± 0.003; DKL −0.275 ± 0.003. Morgan FP: GP −0.114 ± 0.003; DKL −0.100 ± 0.003. DRFP: GP −0.289 ± 0.003; DKL −0.271 ± 0.009. Overall, DKL provides competitive to improved uncertainty estimation, notably outperforming GNNs on graphs. - Error–uncertainty correlation: Spearman ρ between absolute error and predictive variance for MorganFP-DKL is 0.30, comparable to a GNN ensemble. - Data efficiency: Across train/test splits from 80:20 to 5:95, DKL methods maintain advantages over standard GPs. With only 5% training data, GNN-DKL RMSE is similar to DRFP-GP, while fingerprint-based DKL (MorganFP-DKL, DRFP-DKL) still outperforms standard GPs. - Feature analysis: UMAP+k-means on MorganFP-DKL embeddings yields clusters with distinct yield distributions, including a high-yield cluster (median yields 21.9, 3.4, 60.6, 19.0), whereas clusters from raw Morgan fingerprints reflect reactant similarity (e.g., base identity) and have yield distributions close to the overall test median, indicating DKL learns task-relevant features. - Bayesian optimization: Using EI, MorganFP-DKL surrogate outperforms random search and slightly exceeds MorganFP-GP over 20 iterations (averaged over 50 runs). Acquisition functions exploiting uncertainty (EI) outperform greedy mean-only strategies (averaged over 20 runs). BO with MorganFP-DKL slightly outperforms GNN-DKL with graph inputs.

Discussion

The study addresses the challenge of accurate reaction yield prediction with calibrated uncertainty across diverse molecular representations. By embedding a neural feature extractor within a GP kernel (DKL), the approach retains GP-calibrated uncertainty while enabling task-specific representation learning from both engineered features and learned molecular graphs. Empirically, DKL markedly improves over standard GPs for all tested nonlearned representations and matches state-of-the-art GNN performance on graphs, thus unifying modeling across input types. The uncertainty estimates enable principled exploration–exploitation trade-offs in BO, yielding improved optimization trajectories over random baselines and mean-only acquisition. Latent space analysis indicates DKL transforms high-dimensional fingerprints into embeddings aligned with yield structure (e.g., separating high- and low-yield clusters), explaining the gains over fixed-input GPs. Practically, fingerprint-based DKL models offer competitive accuracy and uncertainty with fewer learnable parameters than full GNNs, making them attractive in low-data or resource-constrained settings. These results support DKL as a robust surrogate for reaction optimization workflows, enabling on-the-fly representation learning from unstructured or high-dimensional inputs while preserving uncertainty calibration.

Conclusion

This work introduces a deep kernel learning framework for reaction outcome prediction that combines neural network representation learning with Gaussian process uncertainty quantification. On a 3,955-reaction Buchwald–Hartwig dataset, DKL achieves strong and consistent accuracy across engineered descriptors, fingerprints, and molecular graphs, outperforming standard GPs and matching GNNs while providing predictive uncertainties. The uncertainty estimates enable effective Bayesian optimization, with DKL surrogates outperforming random baselines and slightly exceeding standard GP surrogates; acquisitions using uncertainty (EI) outperform greedy strategies. The approach broadens GP applicability to learned and nonlearned representations and is suitable for integration into reaction discovery pipelines. Potential future directions include validating on diverse reaction classes and objectives, exploring multi-objective and constrained BO, and refining deep kernels and acquisition strategies for discrete, high-dimensional reaction spaces.

Limitations

- Evaluation centers on a single, well-studied reaction dataset (Buchwald–Hartwig cross-coupling); generalization to other reaction types and experimental settings is not empirically demonstrated here. - Out-of-sample tests focused on additive splits; broader extrapolation (e.g., unseen substrates, ligands, or simultaneous unseen components) remains to be comprehensively assessed. - While DKL provides uncertainty estimates and competitive NLPD, uncertainty calibration may vary with representation and setup; further calibration studies across datasets would strengthen conclusions. - BO studies used discrete candidate pools and single-objective optimization over 20 iterations; performance in continuous, higher-dimensional condition spaces and multi-objective scenarios requires further investigation.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

A multimodal deep learning approach for the prediction of cognitive decline and its effectiveness in clinical trials for Alzheimer’s disease

C. Wang, H. Tachimori, et al.

Medicine and Health

Context Aware Deep Learning for Brain Tumor Segmentation, Subtype Classification, and Survival Prediction Using Radiology Images

L. Pei, L. Vidyaratne, et al.

Medicine and Health

Development and evaluation of deep learning algorithms for assessment of acute burns and the need for surgery

C. Boissin, L. Laflamme, et al.

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny