Chemistry

Accelerating the discovery of active and selective CO2RR catalysts using a high-throughput virtual screening strategy

D. H. Mok, H. Lee, et al.

Discover how D. H. Mok, H. Lee, G. Zhang, C. Li, Kun Jiang, and Seoin Back developed a high-throughput virtual screening workflow that utilizes machine learning to identify promising catalysts for CO2 reduction reactions, accelerating the fight against climate change.... show more

Introduction

The study addresses the challenge of rapidly discovering active and selective electrocatalysts for CO2 reduction reaction (CO2RR) across a vast chemical space. Traditional inverse design and high-throughput virtual screening (HTVS) approaches are limited by the need for expensive density functional theory (DFT) calculations and explicit surface structure modeling. Here, the authors propose combining a structure-free, active motif-based machine learning framework (DSTAR) for binding energy prediction with a potential-dependent selectivity map to enable large-scale, data-driven identification of catalysts with desired CO2RR activity and selectivity. The work aims to predict, at given potentials, the activity/selectivity trends for pure metals and binary alloys and to validate promising candidates experimentally, thereby accelerating catalyst discovery.

Literature Review

Prior research has shown that generative models can encode high-dimensional chemical spaces into low-dimensional latent spaces for materials design, while HTVS can identify candidates with desired properties if a sufficiently large materials pool and reliable property predictors exist. The authors previously developed DSTAR, a DFT- and structure-free motif representation that predicts binding energies using elemental descriptors of nearest neighbors, enabling enumeration of active motifs without explicit slab generation. Selectivity mapping for CO2RR based on thermodynamic boundary conditions and scaling relations between key intermediates was introduced by Tang et al., typically using ΔE_CO and ΔE_OH (and a scaling-derived ΔE_H). State-of-the-art graph-based ML models (e.g., LS-CGCNN) can achieve higher accuracy but are constrained by the need for precise geometric inputs. Classical literature classifies metals into formate-, CO-, H2-, and C1+ (further reduced) selective groups; however, discrepancies exist (e.g., Ag, Ga, Zn) likely due to kinetic and local environment effects not captured by purely thermodynamic maps.

Methodology

Active motif enumeration and representation: Using the DSTAR framework, active motifs are described by the elemental identities of three local environments relative to the adsorption site: first nearest neighbors (FNN), second nearest neighbors in the same layer (SNN_same), and sublayer neighbors (SNN_sub). This eliminates the need for slab generation, binding site identification, and iterative optimization.
Dataset construction and substitution: From CO2-related data in the GASpy dataset (89 crystal structures), 5634 unique bimetallic and 408 monometallic active motifs were collected. These motifs were numerically substituted with 30 elements to generate 30 monometallic and 435 bimetallic combinations, expanding bulk structures from 1,089 (GASpy) to 279,690 and producing a total of 2,463,030 active motifs. DSTAR can be extended beyond binaries, but this work focuses on pure metals and binary alloys due to data availability.
ML binding energy prediction: Three adsorption energies were predicted: ΔE_CO, ΔE_OH, and ΔE_H, using DSTAR-based ML models with fivefold cross-validation. Reported test MAEs: 0.118 eV (ΔE_CO), 0.227 eV (ΔE_OH), and 0.107 eV (ΔE_H). Although slightly less accurate than crystal graph neural networks, DSTAR’s simplicity enables far broader chemical space exploration.
3D potential-dependent selectivity map: Thermodynamic boundary conditions for seven reaction steps and six boundaries (BC1–BC6) were constructed using scaling relations that relate intermediate binding energies to ΔE_CO and ΔE_OH. Unlike earlier work that derived ΔE_H from scaling, this study uses directly ML-predicted ΔE_H to reduce compounded uncertainty (noting MAE 0.107 eV for direct prediction vs 0.218 eV via scaling). The selectivity map is parameterized by (ΔE_CO, ΔE_H, ΔE_OH) on x, y, z axes and evaluated at set potentials (e.g., U = −1.0 and −1.4 V_RHE). The 3D map partitions regions into selectivity for formate, CO, C1+ (>2e reductions beyond CO), or H2; outside regions correspond to thermodynamically unfavorable CO2RR/HER at that potential.
Validation of selectivity map: Calculated binding energies on FCC(111) and (211) facets for representative pure metals were positioned on the map to confirm known trends: late transition metals (Rh, Ir, Pt) selective for H2 due to strong CO* and H* binding; Pd becomes CO-selective as PdH_x weakens CO* binding; coinage metals (Au, Ag) favor CO due to weak ΔE_CO and ΔE_OH; p-block elements (e.g., Pb) favor formate.
Productivity metric for HTVS: A unified, potential-dependent productivity metric aggregates both activity and selectivity into a single quantitative value for each product, incorporating ML uncertainty via a probability term reflecting prediction error ranges and mitigating discontinuities from boundary conditions by averaging across many motifs. Heatmaps of productivity across 30 pure metals and 435 binary alloys were generated at specified potentials (e.g., U = −1.4 V_RHE), with additional maps at other potentials (e.g., −1.0 V_RHE) to capture potential-dependent selectivity shifts.
Composition and coordination analysis: Leveraging DSTAR’s motif-level descriptors, composition and coordination number (CN) effects on productivity were dissected, exemplified with Cu–Al alloys by masking subsets of motifs (e.g., high-CN facet vs low-CN edge/corner sites) to elucidate trends.
Experimental validation: Selected predictions were tested experimentally for Cu–Pd and Cu–Ga binary alloys to validate predicted selectivity toward C1+ and formate, respectively.

Key Findings

Scale of screening: 2,463,030 active motifs were generated from 30 elements across 30 monometallic and 435 bimetallic systems; potential-dependent activity and selectivity for CO2RR were evaluated for 465 binary combinations without explicit DFT surface modeling.
ML performance: Fivefold cross-validated MAEs for adsorption energies: ΔE_CO = 0.118 eV, ΔE_OH = 0.227 eV, ΔE_H = 0.107 eV. Certain elements exhibited larger errors; those were excluded from screening.
3D selectivity map validation: The map reproduced known catalyst classes: late transition metals (Rh, Ir, Pt) locate in H2-selective region; PdH_x shifts to CO-selective region due to weakened CO* binding; coinage metals (Au, Ag) favor CO; p-block elements (e.g., Pb) favor formate. Potential dependence aligns with literature (e.g., Cu shifts from formate to C1+ at more negative potentials).
Productivity-based HTVS: A productivity metric unifies activity and selectivity with uncertainty quantification. Heatmaps at U = −1.4 V_RHE and −1.0 V_RHE reveal potential-dependent trends; top 20 candidates for each product were identified (listed in supplementary materials).
Composition/CN effects: In Cu–Al alloys, C1+ productivity increases with decreasing Al content and decreasing CN, suggesting that low-CN (edge/corner) and Cu-rich environments favor deeper reductions.
Experimental validation: Cu–Pd and Cu–Ga binary alloys exhibited high selectivity for C1+ and formate, respectively, consistent with HTVS predictions.
Comparisons with literature and discrepancies: Overall agreement with classical classifications (Hori et al.) was observed. Discrepancies (e.g., Ag predicted to be formate-selective at potentials > −1.3 V_RHE, Ga and Zn formate-selective) are likely due to kinetics and local field effects not captured in the thermodynamic framework.

Discussion

The integrated DSTAR-based ML with a 3D potential-dependent selectivity map successfully addresses the central goal of rapidly identifying selective CO2RR catalysts across a large chemical space. By replacing explicit surface structure modeling with active motif descriptors and using directly predicted ΔE_H, the approach balances accuracy with scalability, enabling comprehensive HTVS. The 3D selectivity framework captures product selectivity boundaries and reproduces known catalytic behaviors while enabling potential-dependent predictions. The productivity metric offers a practical scalar that encodes both activity and selectivity, integrates prediction uncertainties, and reduces artifacts from boundary discontinuities, facilitating robust ranking of candidates. Composition and coordination analyses provide actionable design guidance on alloying and nanostructuring (e.g., favoring low-CN, Cu-rich motifs for C1+ on Cu–Al). Experimental confirmations on Cu–Pd and Cu–Ga validate the predictive power of the screening strategy. Remaining mismatches with some literature systems underscore the need to incorporate kinetics, local environment, and constant-potential effects to further refine predictions.

Conclusion

This work establishes a scalable HTVS workflow for CO2RR catalyst discovery by combining DSTAR-based adsorption energy prediction with a potential-dependent, 3D selectivity map and a unified productivity metric. The approach enables screening of millions of active motifs across hundreds of alloy systems without explicit DFT surface modeling, reproduces known selectivity trends, reveals composition and coordination effects, and is experimentally validated on Cu–Pd (C1+) and Cu–Ga (formate) alloys. The framework accelerates the identification of active and selective CO2RR catalysts and offers interpretable design insights. Future directions include integrating constant-potential DFT and kinetic descriptors (e.g., proton transfer barriers), expanding to multicomponent compositions beyond binaries, and incorporating local field and mass-transport effects to further improve predictive fidelity.

Limitations

Thermodynamic framework: The selectivity map is based on thermodynamic boundary conditions and does not explicitly include kinetics, local field effects, transport phenomena, or constant-potential corrections, contributing to discrepancies (e.g., Ag, Ga, Zn).
ML uncertainty and element coverage: Some elements show elevated prediction errors; these were excluded from screening, limiting coverage. DSTAR accuracy is slightly lower than complex graph-based models, trading accuracy for scalability.
Scope of compositions: The study focuses on pure metals and binary alloys due to data availability, although DSTAR could handle more complex compositions.
Surface structure approximation: The motif-based representation omits explicit geometric/structural details that may influence adsorption energetics and reaction pathways on specific facets and under operando conditions.

Related Publications

Explore these studies to deepen your understanding of the subject.

Chemistry

High-throughput computational-experimental screening protocol for the discovery of bimetallic catalysts

B. C. Yeo, H. Nam, et al.

Chemistry

A general strategy for heterogenizing olefin polymerization catalysts and the synthesis of polyolefins and composites

C. Zou, G. Si, et al.

Medicine and Health

Discovery of a selective and biologically active low-molecular weight antagonist of human interleukin-1β

U. Hommel, K. Hurth, et al.

Medicine and Health

Discovery of MK-8189, a Highly Potent and Selective PDE10A Inhibitor for the Treatment of Schizophrenia

M. E. Layton, J. C. Kern, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny