Chemistry
Sequential closed-loop Bayesian optimization as a guide for organic molecular metallophotocatalyst formulation discovery
X. Li, Y. Che, et al.
Discover a groundbreaking two-step data-driven approach for synthesizing organic photoredox catalysts (OPCs) and optimizing reactions for metallophotocatalysis, as demonstrated by Xiaobo Li, Yu Che, and their team. They efficiently explored only 2.4% of the catalyst space to identify highly competitive OPC formulations.
~3 min • Beginner • English
Introduction
The study addresses the challenge of discovering and optimizing organic photoredox catalysts (OPCs) for complex metallophotocatalytic reactions, where performance depends on intertwined photophysical, redox, and kinetic factors and cannot be reliably predicted a priori. Traditional discovery has relied on design heuristics, trial-and-error, or factorial design of experiments, which become inefficient as dimensionality grows and prior constraints are limited. The authors aim to use a data-driven, closed-loop Bayesian optimization (BO) framework to (1) target synthesis and selection of promising OPCs from a large virtual library and (2) optimize multicomponent reaction conditions in metallophotocatalysis. The target transformation is a dual photoredox/Ni-catalyzed decarboxylative C(sp3)-C(sp2) cross-coupling of amino acids with aryl halides. The purpose is to efficiently navigate vast chemical and formulation spaces, achieve high yields with organic catalysts competitive with iridium systems, and gain mechanistic insight into descriptors governing activity. This is significant for accelerating catalyst discovery, reducing cost/toxicity relative to precious-metal photocatalysts, and systematically handling multivariate complexity beyond simple selection rules.
Literature Review
Photoredox catalysis enables powerful single-electron transfer (SET) activation in synthesis, and its merger with transition-metal catalysis (metallophotocatalysis) broadens reaction scope. Historically, catalyst discovery often involved trial-and-error, high-throughput screening, or simplified property-guided selection. Iridium photocatalysts are versatile benchmarks, while organic photocatalysts such as cyanoarene and CzIPN derivatives have shown efficacy in specific metallophotoredox reactions. However, DOE approaches can struggle in high-dimensional, poorly constrained spaces, and simple photophysical or redox criteria alone fail to capture the multivariate determinants of activity (light absorption, excited-state lifetimes, charge separation, reorganization, and interplay with metal catalysts). Recent advances in BO and autonomous experimentation have demonstrated improved efficiency in reaction and catalyst optimization, motivating its application here to OPC design and metallophotocatalyst formulation.
Methodology
The workflow comprises two sequential, closed-loop Bayesian optimization (BO) campaigns integrating computation, machine learning (ML), and experiment.
1) Virtual library design and encoding:
- Constructed a virtual library of 560 cyanopyridine (CNP) OPC candidates via Hantzsch pyridine synthesis by combining 20 β-keto nitrile Ra groups (electron-donating, electron-withdrawing, or halogen-containing) with 28 aromatic aldehyde Rb groups (polyaromatic hydrocarbons, phenylamines, carbazoles).
- Computed 16 molecular descriptors capturing thermodynamic, optoelectronic, and excited-state properties (e.g., optical gap/oscillator strength, adiabatic IP/EA, excited-state EA*, charge separation/overlap metrics, singlet-triplet gaps, reorganization energies). DFT/TD-DFT (CAM-B3LYP/6-31G; LANL2DZ for Br/I; PCM/SMD for DMF) were used; excitation analyses with Multiwfn. Total computational effort ~145,200 core-hours for 660 CNPs.
2) BO for targeted OPC synthesis (candidate selection):
- Initial selection of 6 diverse CNPs using Kennard–Stone (KS) on the encoded property space; synthesized and tested under fixed conditions (4 mol% CNP; 10 mol% NiCl2·glyme; 15 mol% dtbbpy; 1.5 equiv Cs2CO3; DMF; blue LED; yields averaged over triplicate runs).
- Surrogate model: Gaussian Process (GP) with Matérn kernel; acquisition: Upper Confidence Bound (UCB) αUCB(x)=μ(x)+βσ(x). Parallel batched BO: 12 UCB instances per step with β drawn from a random exponential distribution to span exploration–exploitation; 7 subsequent steps. From each 12-suggestion batch, 6–8 CNPs were synthesized, with human-in-the-loop considerations (e.g., precursor availability). In total, 55 of 560 CNPs were synthesized/tested across 8 steps (0–7). A separate structural-diversity baseline set of 15 CNPs (from KS on Morgan fingerprint space) was also synthesized/tested.
- Model interpretation: SHAP analyses (global and local) on the best GP model trained on the 55 CNPs to identify influential descriptors.
3) BO for reaction condition optimization (formulation search):
- Variables: 18 carbazole-containing CNPs (spanning 0–71% initial performance), 25 pyridyl ligands for Ni, and 10 Ni loadings (1–10 mol%), creating 4,500 condition sets.
- Encoding of a condition set: concatenation of (i) CNP reduction potential Ered [CNP/CNP•−] (experimental), (ii) Morgan fingerprints (CNP), (iii) Morgan fingerprints (ligand), and (iv) Ni concentration. Pairwise distance between condition sets computed as the normalized sum of scalar differences in Ered and Ni loading plus Tanimoto distances of CNP and ligand fingerprints; embedded via UMAP for visualization.
- Surrogate GP with a customized RBF kernel over distances in Ered, CNP fingerprint, ligand fingerprint, and Ni concentration; hyperparameters tuned during training.
- Campaign design: Step 0 measured 19 diverse conditions; then 11 BO steps with 8 suggestions per step (portfolio of UCB β values), totaling 88 BO-tested condition sets (107 including step 0). Reaction testing under standardized protocols (21 h irradiation for this campaign) with blue LEDs; GC-MS quantification.
4) Benchmarking:
- Compared best CNP (CNP-127) against 4CzIPN and Ir[dF(CF3)ppy]2(dtbbpy)PF6 (Ir-cat) across select ligands and light sources; evaluated Ni-loading dependence.
5) Experimental details:
- Photoredox reaction (candidate search): 0.10 mmol aryl halide, 0.15 mmol Boc-Pro-OH, 0.15 mmol Cs2CO3, 0.004 mmol CNP (4 mol%), 0.01 mmol NiCl2·glyme (10 mol%), 0.015 mmol dtbbpy (15 mol%), 4 mL DMF, N2 degas 5 min, blue LED 3 h; yields by GC-MS (biphenyl internal standard).
- Reaction optimization campaign: same substrate/base/CNP mol%, varied NiCl2·glyme and ligand per design, 4 mL DMF, N2 degas 5 min, blue LED 21 h; yields by GC-MS.
6) Computational/ML tools:
- Gaussian 16; Multiwfn; UMAP for dimensionality reduction; SHAP (v0.35) for model explanations; BO implementation with parallel UCB portfolio sampling.
Key Findings
- Efficient OPC discovery with BO: From a virtual library of 560 CNPs, 55 were synthesized/tested via BO-guided selection, raising the best yield from 39% (initial 6) to 67% by step 6, using CNP-127 (Ra05 + Rb16). Overall 75 CNPs were eventually tested including diversity and local-neighbour explorations, with CNP-127 remaining top among tested candidates.
- Structure–activity trends: BO consistently favored ED-type Ra with carbazole Rb (CZ) from step 2 onwards; PA (phenylamine) Rb groups were deselected after step 5. CZ Rb paired with donating Ra gave optimal yields; PA Rb generally poor; PAH Rb usually low except select pairs (e.g., Ra09–Rb21, Ra12–Rb21, Ra14–Rb09) with modest yields.
- Baseline comparison for candidate selection: A structurally diverse control set of 15 CNPs (outside BO suggestions) achieved a maximum yield of 32% and many near-zero yields, corroborating BO’s guidance.
- SHAP insights (candidate model): The optical gap (light absorption) was the most influential feature; electron affinity (EA) the second, with lower EA (more reducing) contributing positively. Excited-state charge separation metrics (D index high; S index low) also strongly favored performance. For CNP-127, strong reducing ability, high S1 oscillator strength, and good charge separation drove predictions.
- Reaction condition optimization: In a 4,500-point space (18 CNPs × 25 ligands × 10 Ni loadings), BO evaluated 88 conditions after 19-step-0 measurements (107 total), improving the maximum yield from 71% (step 0) to 88% by step 6. The top conditions used CNP-127 with ligand L2 (4,4'-dimethyl-2,2'-bipyridine) at 2–5 mol% Ni.
- BO vs random in condition search: BO achieved 88% max yield vs 75% for 44 randomly chosen conditions; 44% of BO-tested conditions exceeded 67% yield vs 4.5% for random.
- Benchmarking: CNP-127 matched or exceeded Ir-cat performance at low Ni loadings and outperformed 4CzIPN under tested conditions. Notably, CNP-127’s yield increased as Ni loading decreased from 10 mol% to ~2–5 mol%, attributed to its strong reducing power (E1/2 red cat/cat ≈ −1.85 V vs Fc+/0; cf. 4CzIPN −1.68 V; Ir-cat −1.72 V).
- Overall efficiency: High-performing OPCs comparable to iridium catalysts were identified while exploring ~9.8% of candidate molecules (55/560) and ~2.4% of reaction conditions (107/4,500).
Discussion
The study demonstrates that closed-loop BO can efficiently navigate complex, multivariate spaces to discover and optimize organic metallophotocatalyst systems. In the absence of reliable a priori selection rules, encoding molecules with rich descriptors and iteratively updating GP surrogates enabled targeted synthesis of only a small subset of candidates while converging on high-performing OPCs (CNP-127) and favorable Ra/Rb chemistries (ED + CZ). SHAP analyses lend interpretability, highlighting that strong light absorption (small optical gap), adequate reducing power (EA), and favorable excited-state charge separation underpin performance in the tested reaction, aligning with mechanistic considerations of SET events bridging photoredox and nickel cycles.
Transitioning to reaction formulation, BO captured synergistic effects among photocatalyst identity, ligand coordination environment, and Ni loading. Re-optimizing conditions with CNP-127 and L2 at moderate Ni loadings raised yields to 88%, illustrating that catalyst ranking can depend on formulation and that systematic condition search is crucial. BO substantially outperformed random sampling in both maximum yield and proportion of high-yielding conditions, confirming its value for prioritizing experiments in large spaces.
Benchmarking showed that an organic catalyst (CNP-127) can rival iridium-based systems and outperform a widely used organic catalyst (4CzIPN) under certain conditions, especially at lower Ni loadings. This highlights the potential of data-driven discovery to deliver sustainable, cost-effective alternatives without exhaustive experimental campaigns. The interpretability and human-in-the-loop aspects support chemical intuition, suggest design principles (e.g., ED–CZ architectures and strong reducing power), and guide future exploration beyond this specific scaffold and reaction.
Conclusion
A two-stage, closed-loop BO framework identified high-performing organic photoredox catalysts for Ni-mediated decarboxylative C(sp3)–C(sp2) cross-coupling and optimized reaction conditions. From a 560-member virtual CNP library, just 55 synthesized candidates yielded a top-performing OPC (CNP-127; 67% under fixed conditions). Subsequent BO-guided formulation optimization across 4,500 possible conditions required only 107 experiments to reach 88% yield with CNP-127 and ligand L2 at moderate Ni loadings. SHAP analysis provided mechanistic-aligned insights, emphasizing optical absorption, reducing power, and excited-state charge separation as key features. The results show that BO can discover OPC formulations competitive with iridium catalysts while sampling a small fraction of the space.
Future directions include: expanding chemical scopes and reaction classes; integrating more comprehensive descriptors (kinetics, diffusion, solvent effects); improving full automation (including synthesis feasibility checks and autonomous make–test decisions); and developing multi-objective BO to balance yield with sustainability, cost, and robustness.
Limitations
- Not all 560 CNPs were synthesized; the global optimum cannot be guaranteed without exhaustive testing. Results suggest but do not prove global optimality within the library.
- Hantzsch synthesis, while general, is not universally feasible across all Ra/Rb pairs; human-in-the-loop decisions filtered BO suggestions for synthetic accessibility, introducing potential bias.
- Redox potentials alone were insufficient predictors; although SHAP provided interpretability, complex interactions remain and models may not generalize beyond the encoded space or reaction.
- Benchmarking against Ir-cat and 4CzIPN was not exhaustively optimized for those catalysts (ligands/loadings), so comparative performance may vary with further optimization.
- Findings are reaction- and scaffold-specific; transferability to other transformations or catalyst classes requires validation.
- Full automation was not implemented; manual steps (synthesis decisions, experiment execution) may limit throughput and introduce human biases.
Related Publications
Explore these studies to deepen your understanding of the subject.

