Chemistry
Molecular identification with atomic force microscopy and conditional generative adversarial networks
J. Carracedo-cosmé and R. Pérez
Discover how Jaime Carracedo-Cosmé and Rubén Pérez push the boundaries of molecular imaging with their innovative Conditional Generative Adversarial Network. This groundbreaking method allows for the extraction of chemical information from high-resolution atomic force microscopy images, leading to precise molecular identification through visually striking ball-and-stick depictions.
~3 min • Beginner • English
Introduction
High-resolution AFM, particularly FM-AFM with CO-functionalized tips, reveals intramolecular structure via Pauli repulsion contrast. Despite successes in identifying specific classes of molecules, general molecular identification from AFM images without prior knowledge remains challenging due to intertwined effects of bonding topology, chemical composition, 3D geometry, tip mechanics, and imaging parameters. Prior AI/DL attempts showed promise but faced limitations: CNNs captured quasi-planar structures mainly composed of C/H and struggled with broader chemical discrimination and highly 3D structures; RNN-based captioning to IUPAC names was constrained by language formulation and yielded limited accuracy. This work reframes identification as an image-to-image translation problem: mapping stacks of constant-height HR-AFM images acquired at multiple tip-sample distances into ball-and-stick molecular depictions that encode atom types (color/size) and bonds (sticks), aiming at full structure and composition determination for arbitrary organic molecules within the chemical space represented in training.
Literature Review
The paper reviews AFM contrast mechanisms (Pauli repulsion, CO tilting, electrostatics, charge distribution) and advances in simulation explaining image features. It surveys AI applications in microscopy, including CNNs for molecular structure inference from AFM (effective for planar C/H systems but less conclusive for O/Cl and 3D structures), CNN-based electrostatic field prediction, and GNNs for molecular graph extraction. The authors’ prior work demonstrated accurate chemical classification on simulated AFM images for 60 planar molecules and a VAE strategy to inject experimental image features to improve classification on experimental data, and an image-captioning approach (CNN+RNN) to predict IUPAC names achieving high chemical group detection but limited full-name accuracy. These efforts motivate a shift to CGAN-based image translation to avoid linguistic constraints and handle multi-molecule fragments or non-IUPAC-describable cases.
Methodology
- Task formulation: Translate a stack of 10 constant-height HR-AFM images spanning ~100 pm in tip-sample distance into a color ball-and-stick depiction encoding atom types (via color/size) and bonds (via sticks/lengths).
- Data: QUAM-AFM dataset of 686,000 organic molecules with simulated AFM stacks and matched ball-and-stick targets covering the chemical species relevant to organic chemistry. Dataset split reported as 581,000/24,000/81,000 for train/validation/test (main text), with a large test set for robust assessment. For each input during training, one of 24 combinations of AFM simulation parameters (e.g., oscillation amplitude, CO-tip lateral stiffness) is randomly chosen to promote invariance to experimental settings.
- Generator (modified pix2pix-style CGAN): Input 10-image grayscale stack. Initial dropout layer (rate 0.5) followed by two 3D convolution layers (64 kernels each; sizes (3,3,3) stride (1,1,1) and (4,4,4) stride (2,2,2) with padding). Output reshaped to (128,128,64) and activated with Leaky ReLU. Encoder-decoder U-Net-like 2D pathway with seven downsampling blocks (2D conv kernels 4×4, stride 2; channels progressing 128, 256, 512, 512, 512, 512; batch norm; Leaky ReLU α=0.2), skip connections to corresponding decoder blocks; decoder uses dropout on selected layers and ReLU activations, with tanh on final layer. Output predicts per-pixel RGB values of the ball-and-stick image.
- Discriminator: Patch-based discriminator taking the AFM stack (as multi-channel image) concatenated with either generated or ground-truth output. Sequence of 2D conv layers (4×4 kernels, stride 2) with channels 64, 128, 256, 512, 512; Leaky ReLU activations (α=0.2) and batch norm; final 2D conv (4×4) with sigmoid to produce patch realism map.
- Data augmentation (Image Data Generator): Spatial transforms (zoom, rotations, shifts, flips, small shakes) applied identically to AFM inputs and targets to preserve pixel alignment; shear not applied to targets (as it represents experimental distortion not present in ground truth). Parameter tuning of augmentation was found critical for performance.
- Training: Objective combines adversarial loss with L1/MAE reconstruction loss (λ=100) to encourage sharp, accurate translations. 100,000 training iterations with periodic validation (e.g., 300 predictions every 10,000 iterations) to select optimal training. Variability in AFM simulation parameters plus augmentation aims to reduce overfitting and improve generalization across substrates, tip conditions, and imaging modes.
- Evaluation: Quantitative/qualitative tests on simulated data; human-verified comparisons for 3015 randomly selected molecules from the 81,000 test set with randomized AFM parameter combinations; analysis of accuracy versus molecular corrugation (height differences). Experimental tests performed on published constant-height AFM stacks (often fewer than 10 images); missing images created by interpolation and light denoising (medianBlur size 3) to assemble 10-image inputs; tests also include different AFM operation modes (FM, Q-controlled AM-AFM amplitude/phase).
Key Findings
- Simulated AFM tests: On 3015 randomly selected molecules (stacks span 24 AFM parameter combinations), the CGAN frequently produced perfect ball-and-stick predictions, recovering both bonding topology and atom types, even in complex molecules with strong local charge effects. Accuracy trends with corrugation: total accuracy and composition (atom-type) accuracy decrease approximately linearly with maximum intramolecular height difference; structural accuracy is roughly linear up to ~150 pm corrugation, then degrades more rapidly beyond this range due to limited visibility of lower-lying atoms by the CO tip.
- Error modes: Common confusions include O vs F when bound to aromatic rings and swapping N–H groups for O in planar porphyrin-like environments due to similar AFM contrast arising from electronegativity and charge distributions. In strongly 3D gas-phase structures, the topmost regions are accurately identified while lower regions are missed/blurry, reflecting AFM’s intrinsic probing limitation rather than model failure.
- Role of training on gas-phase structures: Despite differences from adsorbed configurations, training on diverse gas-phase conformations helps the model learn local, environment-dependent height-contrast relationships, enabling generalization across substrates and adsorption configurations.
- Experimental AFM tests: Despite limited stacks (often 3–6 images) and tip asymmetries, the CGAN recovered substantial structural and compositional information. Examples include accurate identification of iodine in 2-iodo-derivatives from amplitude images in Q-controlled AM-AFM and improved O vs F discrimination in experimental images compared to simulations. Some challenging cases (e.g., dibenzothiophene) showed partial failures due to noise and atypical deformations, but prior VAE-based incorporation of experimental features suggests a path to improve CGAN performance on such cases.
- Cross-mode robustness: The model provided reasonable predictions on amplitude, phase, and FM images, despite these modes and parameter ranges differing from those used in simulations, illustrating robustness to operation mode variability when sufficient contrast evolution with height is present.
Discussion
The study demonstrates that mapping stacks of HR-AFM images to ball-and-stick depictions via CGANs can recover complete molecular structure and composition for a broad set of organic molecules. By learning local correspondences between AFM contrast evolution and atomic environments, the model generalizes beyond specific adsorption configurations and even across operation modes. The dependence of accuracy on molecular corrugation reflects a physical limit of CO-tip AFM in accessing lower-lying regions; nonetheless, the model reliably identifies topmost regions and atom types. Misclassifications (e.g., O vs F, N–H vs O) arise where AFM contrast is intrinsically similar; experimental conditions can sometimes disambiguate such cases. Training on gas-phase structures, augmented with diverse AFM parameters and spatial transforms, confers robustness and avoids over-specialization to a particular substrate. Experimental validations, despite reduced input information and noise, show the approach can surpass human interpretability in some instances. Integration with strategies like Bayesian inference plus DFT, or augmenting training with VAE-generated experimental-like images, could extend accurate identification to highly corrugated or noisy scenarios.
Conclusion
The paper introduces a CGAN-based image-to-image translation framework that converts stacks of constant-height HR-AFM images into ball-and-stick molecular depictions, enabling end-to-end identification of structure and chemical composition. Trained on the large QUAM-AFM dataset spanning relevant organic chemistries and AFM parameter variability, the model achieves high accuracy on simulated data and demonstrates strong generalization to experimental images and different AFM operation modes. Accuracy declines with strong intramolecular corrugation, reflecting AFM’s inherent limitations, but the model remains effective for quasi-planar adsorbed molecules common in experiments. Future work includes incorporating experimental-image characteristics into training (e.g., via VAE augmentation), optimizing AFM imaging protocols for the model (more height slices, appropriate parameter ranges), and integrating Bayesian/DFT methods to handle highly 3D structures and to further improve atom-type disambiguation (e.g., O vs F, N–H vs O).
Limitations
- Reduced performance for molecules with large intramolecular height differences (>150 pm), due to CO-tip lateral relaxation limiting access to lower-lying regions; structural completeness may suffer even when top regions are correctly identified.
- Occasional atom-type confusions in environments with intrinsically similar AFM contrast (e.g., O vs F on aromatic rings; N–H vs O in planar porphyrinic motifs).
- Experimental validation constrained by limited available stacks (often fewer than 10 images), noise, and tip asymmetries; interpolated images do not add true information and may hinder performance.
- Model predictions are limited to chemical species present in the training dataset (as provided by QUAM-AFM).
- Some inconsistencies between simulated and experimental contrast (e.g., unusual deformations) can degrade predictions; requires targeted augmentation with experimental-like features to improve robustness.
- Gas-phase training structures may not match adsorption geometries; while this aids generalization to local contrasts, exact reconstruction of highly substrate-influenced geometries can remain challenging.
Related Publications
Explore these studies to deepen your understanding of the subject.

