Introduction
The activation of organic substrates via single-electron transfer (SET) using photoredox catalysts is a powerful tool in organic synthesis. Metallophotocatalysis combines photoredox catalysis with transition-metal catalysis, enabling challenging organic reactions. However, predicting the catalytic activities of OPCs from first principles is difficult due to the complex interplay of properties influencing catalyst activity. Traditional discovery methods rely on a mix of design, trial and error, and serendipity. High-throughput synthesis and testing are also used, but these methods can be inefficient, especially for complex systems. This research aimed to develop a data-driven approach to streamline the discovery of OPCs, specifically for decarboxylative sp³-sp² cross-coupling of amino acids with aryl halides, using a synergistic combination of photoredox and nickel catalysis, with the goal of replacing commonly used iridium photocatalysts with the more cost-effective and less toxic OPCs. OPCs offer potential advantages over transition-metal-based photocatalysts, including lower cost, lower toxicity, and high chemical diversity. While iridium-based photocatalysts are known for their versatility and high performance, recent advances highlight the effectiveness of organic photocatalysts in specific metallophotoredox reactions. Optimizing these multicomponent systems, however, remains labor-intensive.
Literature Review
The literature extensively covers photoredox catalysis and its applications in organic chemistry, highlighting the use of various photoredox catalysts, including transition-metal complexes and organic molecules like cyanoarenes and acridinium salts. Several studies detail the use of high-throughput screening and computational methods for catalyst discovery in photoredox and other catalytic systems. The challenges of optimizing multicomponent systems, especially metallophotocatalysis reactions which require the synergistic interaction of multiple components are frequently discussed. Methods like factorial design of experiments (DOE) are mentioned, but their limitations in high-dimensional search spaces are noted. Studies focusing on the structure-activity relationships of photoredox catalysts, and particularly efforts to predict catalytic activity from first principles using computational methods, like DFT calculations, are also cited. The use of molecular descriptors to encode the chemical space of potential catalysts for machine learning (ML) models has also been demonstrated in previous works. These methods help to narrow down the search space for promising catalysts.
Methodology
This study employed a two-stage sequential closed-loop Bayesian optimization (BO) workflow, integrating predictive machine learning (ML) with experiments. The first stage focused on the targeted synthesis of organic photoredox catalysts (OPCs). A virtual library of 560 cyanopyridine (CNP) molecules was designed using the Hantzsch pyridine synthesis, a metal-free, high atom efficiency multicomponent reaction. The molecules were encoded using 16 molecular descriptors capturing thermodynamic, optoelectronic, and excited-state properties. A batched, constrained, discrete BO algorithm, using a Gaussian process (GP)-based surrogate model, was employed to select CNPs for synthesis and testing. The algorithm iteratively updated its model based on experimental results, selecting a subset of CNPs for synthesis in each step, balancing exploration and exploitation of the chemical space. The selection process also incorporated chemical intuition, considering factors like the availability of starting materials. In the second stage, reaction conditions were optimized for 18 selected CNPs. Three key variables were considered: CNP photocatalyst, nickel catalyst concentration, and Ni-coordinating pyridyl ligand. This created a search space of 4,500 possible conditions. A similar BO workflow, using a custom radial basis function (RBF) kernel, guided the selection of conditions for experimental testing. The algorithm combined different distances metrics for CNPs, pyridyl ligands, and Ni concentrations in the RBF kernel. UMAP dimensionality reduction was used to visualize the chemical space of both the CNP molecules and the reaction conditions. SHAP (Shapley additive explanations) analysis was used to interpret the ML models, providing both global and local explanations of the model's predictions. The activity of the best-performing CNP was benchmarked against established catalysts (4CzIPN and an iridium catalyst).
Key Findings
The first BO workflow led to the synthesis and testing of 55 CNPs, achieving a maximum reaction yield of 67%. The second BO workflow, focusing on reaction condition optimization, explored 107 of 4,500 possible conditions, reaching a maximum yield of 88%. This yield is comparable to that achieved by iridium catalysts. SHAP analysis revealed that light absorption, electron affinity, and excited-state charge separation were crucial molecular features for photocatalytic activity. The best-performing catalyst, CNP-127, exhibited comparable activity to iridium catalysts and superior performance to 4CzIPN, particularly at lower nickel loadings. The BO approach significantly outperformed random sampling in identifying high-performing catalysts and conditions.
Discussion
The results demonstrate the effectiveness of a sequential closed-loop BO strategy for the discovery and optimization of metallophotocatalysts. The use of molecular descriptors and ML models allowed for efficient exploration of a large chemical space, significantly reducing the number of experiments required. The SHAP analysis provided valuable insights into the key molecular features influencing photocatalytic activity, guiding future catalyst design. The comparable performance of the identified OPCs to iridium catalysts highlights the potential of organic photocatalysts as sustainable and cost-effective alternatives. The ability to achieve high yields at lower nickel loadings is particularly important for improving the sustainability and reducing potential metal contamination.
Conclusion
This study successfully employed a data-driven approach using Bayesian optimization and machine learning to discover and optimize organic photoredox catalysts for a challenging cross-coupling reaction. The identified catalysts show performance comparable to iridium catalysts, highlighting the potential of this method for accelerating the discovery of novel metallophotocatalysts. Future work could focus on automating the synthesis and testing of candidate molecules, further refining the BO approach and exploring new classes of organic photocatalysts.
Limitations
The Hantzsch pyridine synthesis, while versatile, might not be suitable for all combinations of functional groups. The study's reliance on computational modeling introduces uncertainties related to the accuracy of DFT and TD-DFT calculations. The benchmark comparison with iridium and 4CzIPN catalysts was not conducted under fully optimized conditions for all catalysts.
Related Publications
Explore these studies to deepen your understanding of the subject.