Chemistry

Enabling high throughput deep reinforcement learning with first principles to investigate catalytic reaction mechanisms

T. Lan, H. Wang, et al.

Discover HDRL-FP, a revolutionary framework leveraging deep reinforcement learning to decode catalytic reaction mechanisms at unprecedented speed. This groundbreaking research by Tian Lan, Huan Wang, and Qi An showcases insights into hydrogen and nitrogen migration during ammonia synthesis, uncovering a transition state that simplifies processes with lower energy barriers.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses how to autonomously discover and evaluate catalytic reaction pathways with generalizability and low computational cost. Catalytic reactions often involve complex, multi-step mechanisms with transient intermediates and heterogeneous, dynamic surfaces, making experimental elucidation and brute-force computational enumeration impractical. While reinforcement learning (RL) can, in principle, navigate reaction networks, standard RL struggles with non-stationarity, correlation in experience, and highly nonconvex, noisy, high-dimensional potential energy landscapes typical of catalysis. Prior RL applications in chemistry frequently rely on reaction-specific state encodings, heuristic actions, and tailored reward shaping, which limit transferability and hinder discovery beyond predefined networks. The purpose of this work is to develop a reaction-agnostic RL framework, HDRL-FP, that uses only atomic positions mapped to first-principles (DFT) potential energy landscapes, and to demonstrate its effectiveness and scalability by resolving key hydrogenation and diffusion steps relevant to the Haber–Bosch process on Fe(111).

Literature Review

Recent advances in AI and ML have impacted chemistry, yet many methods for reactivity are mechanism-agnostic and depend on human-designed features. Prior RL studies for catalytic mechanisms used semi-empirical, reaction-specific environment designs with bespoke state vectors (e.g., site occupancy encodings), heuristic action sets, and reward transformations. Such designs constrain exploration to predefined reaction networks, limit transferability across reactions, and often preclude explicit atomic motion modeling. The authors’ previous RL work used a 23-element encoded state vector and predefined actions corresponding to specific surface reactions, which lacked generality for other complex mechanisms. In contrast, the present work aims to remove these constraints by basing states and actions on atomic coordinates and simple movements, coupled directly to first-principles energy evaluations, and supported by high-throughput parallel RL to overcome exploration and stability challenges.

Methodology

HDRL-FP formulates catalytic pathway exploration as a Markov decision process with states, actions, transitions, and rewards derived directly from atomic configurations and first-principles energetics. States are defined by normalized Cartesian coordinates of the migrating atom(s) and their normalized Euclidean distance to the target position in the product within an orthogonal supercell. For multiple migrating atoms, state vectors concatenate each atom’s normalized coordinates and its distance to target. Actions are stepwise moves in six directions on a 3D grid (forward/backward/left/right/up/down). For multi-atom scenarios, a two-head policy selects both the atom to move and the movement direction, reducing the action space to the sum of atom choices plus six directions. Periodic boundary conditions maintain continuity across cell edges. Rewards are tied to DFT-derived potential energy differences: each action receives a negative reward r = ΔE/E0 (with ΔE in eV; E0 rescales rewards) and an additional penalty for unphysical moves (e.g., collisions/overly close approaches). The environment is reaction-agnostic, built solely from atomic positions mapped to a precomputed potential energy landscape (PEL). High-throughput architecture: Thousands of concurrent environment instances run entirely on a single GPU using a block/thread layout; each environment instance holds references to shared PEL and shared policy/value networks in global GPU memory. Rollouts, action inference, and training occur in-place on the GPU to avoid CPU–GPU data transfer overhead. Finished environments auto-reset without synchronizing others. After collecting large experience batches (size 500 × N, where N is number of environments), synchronized policy updates are performed, ensuring all actors start subsequent rollouts from the updated policy. Learning: Actor–critic with proximal policy optimization (PPO) and Adam optimizer. Shared fully connected neural networks with two hidden layers (50 neurons each, ReLU). The policy outputs a 6-way Softmax over directions (and an additional head for atom-choice in multi-atom cases). The value network outputs a scalar estimate of expected return. Discount factor γ = 0.99; rollout horizon K = 500 steps; batch updates after filling the experience buffer. Data transfer efficiency: One-time host-to-device transfer of environment definitions, PEL, and model parameters; subsequent rollout storage and training occur entirely on device. First-principles setup: DFT with VASP (v5.4), PBE-D3, plane-wave cutoff 500 eV, Methfessel–Paxton smearing 0.2 eV, spin polarization for Fe, convergence criteria 1e-5 eV (electronic) and 1e-3 eV/Å (ionic). Fe-bcc(111) slab: (2×2) cell, six layers with top three relaxed, bottom three fixed, 15 Å vacuum, Γ-centered 4×4×1 k-mesh. Free energies from harmonic approximation (finite displacement phonons) and literature-consistent gas-phase thermochemistry. Transition states identified by climbing-NEB (4 images) and refined by dimer method; verification via a single imaginary frequency. Application cases: hydrogenation step 2N·NH2 + 2H → 2N·NH3 + H on Fe(111) via Langmuir–Hinshelwood (LH) and Eley–Rideal (ER) mechanisms; N atom diffusion between bridge sites; N2 molecule diffusion between top Fe sites (two-atom, six DoF grid with 0.9 Å spacing).

Key Findings

- Scalability and convergence: On a single Nvidia A100 GPU, HDRL-FP ran up to 500 concurrent environments with end-to-end training throughput of ~0.23 million experience steps/s. Throughput scales nearly linearly with the number of environments until limited by GPU memory. For LH hydrogenation, global optimum was reached within ~120, 70, and 45 minutes using 20, 100, and 500 environments, respectively; four environments failed to converge even after a day. For ER, robust convergence was achieved with 100 and 500 environments, while 20 or fewer did not converge satisfactorily. - Reaction pathways and mechanisms (NH2 → NH3 on Fe(111)): HDRL-FP identified hydrogen migration paths for both LH and ER mechanisms. NEB calculations guided by RL paths revealed that both mechanisms share an identical transition state structure with an energy barrier of ~1.40 eV (potential energy). Free-energy corrections at T = 673 K and P = 20 atm reduce the barrier to ~1.16 eV, ~0.24 eV lower than prior estimates. - Updated free-energy landscape for Haber–Bosch on Fe(111): A newly identified configuration lowers the relative free energy of 2N·NH2·2H from 0.07 eV to −0.04 eV and decreases the associated transition state energy by ~0.35 eV. Consequently, the previously assumed rate-determining NH2 hydrogenation step is no longer RDS. Instead, the 3N·NH2·2H hydrogenation step becomes RDS with a free-energy barrier of ~1.47 eV at 673 K. The predicted reaction rate at 673 K is ~137.7 s⁻¹, up from a previous estimate of ~58.2 s⁻¹. - Diffusion studies and generalizability: For N atom diffusion between bridge sites on Fe(111), both HDRL-FP and NEB indicate two barriers, with the higher barrier located as N traverses between the top and sublayer Fe atoms; relaxed vs unrelaxed landscapes show smoother profiles upon relaxation but similar path topology. For the two-atom N2 diffusion between top Fe sites, RL+NEB yields a barrier of ~0.52 eV, lower than N atom diffusion, demonstrating the framework’s efficacy for multi-atom processes. - Overall, HDRL-FP discovered physically plausible, lower-barrier pathways compared to direct NEB initial guesses, improved reaction rate predictions, and provided mechanistic insight (shared TS for LH and ER).

Discussion

HDRL-FP directly addresses the central challenge of discovering catalytic reaction mechanisms without bespoke, reaction-specific RL encodings. By representing states as atomic coordinates and actions as elementary displacements, and tying rewards to DFT-derived energetics, the method generalizes across diverse reactions and surfaces. Massive parallelism on a single GPU mitigates RL’s non-stationarity and correlation issues, stabilizes training, and accelerates convergence to globally optimal pathways. Mechanistically, identifying a shared transition state for LH and ER in NH2 hydrogenation implies similar activation energies and rates for both, reshaping understanding of this key step in ammonia synthesis under relevant conditions. The reconstructed free-energy landscape revises the rate-determining step to 3N·NH2·2H hydrogenation, leading to higher predicted turnover rates and suggesting that previous bottlenecks were overestimated. Diffusion studies for N and N2 further confirm the framework’s generality, including multi-atom dynamics. These insights can inform catalyst optimization, operating condition refinement, and targeted experimental validation, potentially reducing energy usage and CO2 emissions in Haber–Bosch processes.

Conclusion

The work introduces HDRL-FP, a reaction-agnostic, high-throughput deep RL framework tightly coupled to first-principles energetics for autonomous discovery of catalytic reaction pathways. Running entirely on a single GPU with thousands of parallel environments, HDRL-FP attains rapid, stable convergence and uncovers plausible, lower-barrier mechanisms. Applied to Fe(111) Haber–Bosch chemistry, HDRL-FP reveals a shared TS for LH and ER hydrogenation of NH2, lowers the free-energy barrier to ~1.16 eV at 673 K/20 atm, revises the rate-determining step to 3N·NH2·2H hydrogenation (1.47 eV), and increases the predicted reaction rate to ~137.7 s⁻¹. Additional N and N2 diffusion studies validate generalizability, including multi-atom dynamics. Future directions include integrating more flexible relaxation schemes with constraints, accounting for anharmonic effects at elevated temperatures (e.g., via molecular dynamics), extending to broader multi-atom reaction networks, and applying the method across catalysts and surfaces to guide design and process optimization.

Limitations

- Approximate environment representation: During RL exploration, all atoms except the migrating ones are often fixed, yielding an unrelaxed PEL that can approximate but not guarantee exact transition states or barriers; NEB/dimer refinement is required. - Thermodynamics approximations: Free-energy corrections rely on the harmonic approximation; at high temperatures, anharmonic effects may be significant and require molecular dynamics or advanced treatments. - Convergence dependence on parallelism: Stable, fast convergence depends on running many concurrent environments; few replicas can fail to converge. - Constraint design for relaxation: For relaxed environments, maintaining reactant/product configurations necessitates constraints; optimal constraint strategies remain an open design choice. - Resource limits: Throughput scales with the number of environments until constrained by GPU memory; very large systems or finer grids may require more memory or multiple GPUs. - ER initial conditions: Some mechanisms (e.g., ER) still require assumptions about initial positions (e.g., gas-phase H starting distance), which may influence exploration unless systematically sampled.

Related Publications

Explore these studies to deepen your understanding of the subject.

Chemistry

Enabling late-stage drug diversification by high-throughput experimentation with geometric deep learning

D. F. Nippa, K. Atz, et al.

Psychology

Ageing is associated with disrupted reinforcement learning whilst learning to help others is preserved

J. Cutler, M. K. Wittmann, et al.

Medicine and Health

Combining machine learning with high-content imaging to infer ciprofloxacin susceptibility in isolates of *Salmonella Typhimurium*

T. Tran, S. Sridhar, et al.

Engineering and Technology

Design of optical meta-structures with applications to beam engineering using deep learning

R. Singh, A. Agarwal, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny