Physics
Precise atom manipulation through deep reinforcement learning
I. Chen, M. Aapro, et al.
Discover how a team of researchers, including I-Ju Chen and Markus Aapro, utilizes deep reinforcement learning to expertly control atomic manipulation. This innovative approach addresses unknown parameters and enhances the precision of atom placement, revolutionizing nanofabrication and atomic-scale experimentation.
~3 min • Beginner • English
Introduction
Atom manipulation using a scanning tunneling microscope (STM) is a unique technique for realizing atomically precise structures to study exotic quantum states and to miniaturize computational devices to the scale of individual atoms. Applications span artificial structures on metal surfaces and have been extended to superconductors, 2D materials, semiconductors, and topological insulators, enabling topological and many-body phenomena. However, arranging adatoms with atomic precision requires tuning tip–adatom interactions (via tip position, bias, and tunneling conductance), which are not known a priori and vary with each adatom/surface and tip apex condition. Incorrect manipulation parameters can cause loss of control, tip crashes, or unintended rearrangements; spontaneous tip apex changes can invalidate previously valid parameters and typically require expert intervention. Deep reinforcement learning (DRL) provides a data-driven, trial-and-error approach to control in dynamic environments and offers potential to autonomously discover effective manipulation strategies without explicit modeling of tip-atom interactions. This study investigates whether state-of-the-art DRL can learn precise and efficient atom manipulation policies directly on a real STM setup and remain robust to tip condition changes, thereby enabling autonomous atomic assembly.
Literature Review
Prior STM atom-manipulation work since the 1990s has demonstrated quantum corrals, engineered quantum states, and atomic-scale devices including logic gates, memory, and Boltzmann machines. Manipulation has been performed on metal surfaces and extended to superconductors, 2D materials, semiconductors, and topological insulators. Machine learning has been integrated into scanning probe microscopy for various tasks, and DRL with discrete actions has been used for automating tip preparation and vertical manipulation of molecules. In DRL broadly, recent algorithms have achieved superhuman performance in games and simulations, with improvements in data efficiency and stability enabling real-world automation. These advances motivate applying continuous-control DRL to the STM atom manipulation problem, where parameters are unknown and may drift due to tip apex changes.
Methodology
The atom manipulation task is formulated as a Markov decision process. Each episode begins with an adatom at a starting position and a randomly sampled target position approximately one lattice constant (0.288 nm) to ~2.000 nm away. The maximum episode length is N = 5 manipulations. The state s_t is a four-dimensional vector containing the XY coordinates of the target (x_target) and the current adatom position (x_adatom), extracted from STM images. The continuous action a_t is a six-dimensional vector specifying: bias V (5–15 mV), tunneling conductance G (3–6 µA/V), and the XY coordinates of the tip start (x_tip,start) and end positions (x_tip,end) for a lateral manipulation. The policy π is modeled as a multivariate Gaussian output by a neural network (the actor). After executing an action on the STM, a combined classifier evaluates whether the adatom likely moved using tunneling current traces: a 1D CNN (two conv layers, kernel size 64, stride 2; max-pool kernel 4, stride 2; dropout 0.1; sigmoid output; Adam optimizer lr=1e-3, batch size=64) trained initially on ~10,000 traces (~80% accuracy/TPr/TNr) and continuously updated during DRL, and an empirical thresholding formula based on current trace standard deviation to detect spikes. STM scans to update adatom position are performed if either predictor is positive, randomly with probability ~20–40%, and at episode termination, reducing scan frequency (scans after ~90% of manipulations during training). The reward function depends on the manipulation error E = ||X_adatom − X_target|| (Eq. (1) in the paper; the main training excludes auxiliary directional terms to avoid overshooting). The algorithm uses Soft Actor-Critic (SAC) with two Q-function critics and an entropy-regularized objective to encourage exploratory yet high-reward actions. Replay memory stores tuples (s_t, a_t, r_t, s_{t+1}). Hindsight Experience Replay (HER) augments training by relabeling goals with achieved states using the ‘future’ strategy (up to three goals per transition). Emphasizing Recent Experience (ERE) sampling prioritizes recent experiences during gradient updates to adapt quickly to environmental changes (C_k = max(η^k N, C_min) with η=0.994, C_min=500). SAC hyperparameters include: Adam optimizer, learning rate 3×10^−4, two hidden layers with 256 units (ReLU), minibatch size 64, replay buffer size 10^6, discount γ=0.9, target smoothing τ=0.005. Experimental setup: Ag(111) crystal cleaned by Ne sputtering (1 kV, 5×10^−5 mbar) and annealing in UHV (p<10^−8 mbar). Manipulation at ~5 K in a Createc LT-STM/AFM with Createc electronics/software (v4.4). Ag adatoms are deposited by gentle tip indentation. Baseline parameters verified (V=10 mV, G=6 µA/V; up/down/left/right movements) after significant tip changes, reshaping tip until stable manipulation. Additional experiments trained DRL agents to manipulate Co adatoms on Ag(111) in two regimes: standard (bias 5–15 mV, conductance 3–6 µA/V) and high-bias (bias 1.5–3 V, conductance 8–24 nA/V), noting high bias/current risks tip/substrate damage. For autonomous assembly, the Hungarian algorithm (SciPy linear_sum_assignment; cost: Euclidean distances) assigns adatoms to targets to minimize total movement, and an any-angle RRT path planner (PythonRobotics implementation) plans collision-free paths between start and final positions; the DRL agent executes each short-range manipulation (<2 nm).
Key Findings
- Training performance: After ~2000 training episodes (~6000 manipulations), the agent achieves 100% success rate over 100 episodes, with decreasing mean episode length indicating efficiency gains. Tip crashes causing apex changes degrade performance transiently, but recovery occurs within a few hundred additional episodes.
- Precision: Best performance yielded a 100% mean success rate and a mean placement error of 0.089 nm over 100 episodes, significantly below the 0.288 nm lattice constant. Error distribution analysis and site-geometry modeling estimate the probability of placing an atom at the nearest adsorption site as 61% (if fcc and hcp sites are reachable) to 93% (if only fcc sites are reachable).
- Robustness vs baseline: Under three distinct tip conditions, a fixed-parameter baseline (V=10 mV, G=6 µA/V, tip moves target+0.1 nm) achieved 100%, 92%, and 68% success rates over 100 episodes, respectively. The continually trained DRL agent maintained relatively good performance initially and converged to >95% success after further training in all three cases, demonstrating adaptability to tip changes.
- Adsorption statistics: Movement distributions show occupancy of both fcc and hcp hollow sites, with six peaks at ~0.166 nm from the origin; lattice orientation inferred from manipulation data agrees with atomically resolved scans.
- Autonomous assembly: Combining DRL with assignment and RRT path planning, the system constructed a 42-atom kagome lattice with atomic precision. Total of 66 manipulations were performed; one manipulation plus scan takes roughly one minute, yielding a build time of around one hour (excluding adatom deposition).
Discussion
The study demonstrates that a DRL agent can learn precise and efficient control policies for STM-based atom manipulation directly from interaction data, addressing the challenge of unknown and drifting manipulation parameters that typically require expert tuning. By optimizing a continuous action space over bias, conductance, and tip trajectories, and by leveraging SAC with HER and ERE, the agent achieves high precision and success rates while adapting to tip apex changes. Integration with assignment and path planning algorithms enables end-to-end autonomous assembly of complex structures, as shown by the 42-atom kagome lattice. Statistical analyses of movement data further provide insights into adsorption site occupancy and lattice orientation without dedicated atomic-resolution imaging. These results indicate that DRL can robustly handle the stochastic, nonlinear dynamics at the atomic scale and is suitable for scaling up nanofabrication and for automating complex experiments where explicit models are difficult to obtain.
Conclusion
By formalizing STM atom manipulation as a reinforcement learning problem and combining state-of-the-art methods (SAC with HER and ERE) with real-time classification of atom movement, the DRL agent autonomously learns to manipulate adatoms with atomic precision and high data efficiency. The approach is more adaptive to tip condition changes than fixed parameter baselines and, when coupled with path planning, enables fully autonomous assembly of artificial atomic lattices. This work marks a significant step toward AI-driven automation in nanofabrication and suggests applicability to other surface/adsorbate systems where stable manipulation parameters are not known and to the assembly and operation of atomic-scale devices.
Limitations
- Environment dependence and adaptation: Significant tip apex changes still cause temporary performance degradation and require additional training episodes for recovery.
- Sensing and scanning overhead: Despite reduced scan frequency, STM scans remain the most time-consuming part of training; classification of movements relies on a CNN and empirical thresholding with ~80% accuracy on initial test data, potentially leading to missed or false movement detections.
- Site assignment uncertainty: Exact adsorption site positions were not determined during placement; nearest-site placement rates are estimated probabilistically based on site geometry and error distributions, not measured directly.
- Path planning suboptimality: The RRT any-angle planner may not find optimal or near-optimal paths; occasional dimers (1–2) may be present in the final structure, likely pre-existing but highlighting collision/path planning limitations.
- Parameter regimes and risks: High-bias manipulation regimes (for Co) can risk tip/substrate damage; generalization beyond tested materials (Ag/Ag(111), Co/Ag(111)) requires further validation.
- Low-temperature, UHV requirement: Experiments were conducted at ~5 K in UHV; performance in other environmental conditions is not assessed.
Related Publications
Explore these studies to deepen your understanding of the subject.

