logo
ResearchBunny Logo
A robust synthetic data generation framework for machine learning in high-resolution transmission electron microscopy (HRTEM)

Engineering and Technology

A robust synthetic data generation framework for machine learning in high-resolution transmission electron microscopy (HRTEM)

L. R. Dacosta, K. Sytwu, et al.

Explore how the innovative Python package Construction Zone enables the generation of complex nanoscale atomic structures, significantly enhancing the creation of diverse synthetic datasets for training machine learning models to analyze HRTEM images. This groundbreaking research from Luis Rangel DaCosta, Katherine Sytwu, C. K. Groschner, and M. C. Scott achieves state-of-the-art nanoparticle image segmentation using solely simulated data.

00:00
00:00
~3 min • Beginner • English
Abstract
Machine learning techniques are attractive options for developing highly-accurate analysis tools for nanomaterials characterization, including high-resolution transmission electron microscopy (HRTEM). However, successfully implementing such machine learning tools can be difficult due to the challenges in procuring sufficiently large, high-quality training datasets from experiments. In this work, we introduce Construction Zone, a Python package for rapid generation of complex nanoscale atomic structures which enables fast, systematic sampling of realistic nanomaterial structures and can be used as a random structure generator for large, diverse synthetic datasets. Using Construction Zone, we develop an end-to-end machine learning workflow for training neural network models to analyze experimental atomic resolution HRTEM images on the task of nanoparticle image segmentation purely with simulated databases. Further, we study the data curation process to understand how various aspects of the curated simulated data—including simulation fidelity, the distribution of atomic structures, and the distribution of imaging conditions—affect model performance across three benchmark experimental HRTEM image datasets. Using our workflow, we are able to achieve state-of-the-art segmentation performance on these experimental benchmarks and, further, we discuss robust strategies for consistently achieving high performance with machine learning in experimental settings using purely synthetic data. Construction Zone and its documentation are available at https://github.com/lerandc/construction_zone.
Publisher
npj Computational Materials
Published On
Jul 29, 2024
Authors
Luis Rangel DaCosta, Katherine Sytwu, C. K. Groschner, M. C. Scott
Tags
Construction Zone
nanoscale atomic structures
machine learning
HRTEM images
image segmentation
simulation fidelity
synthetic datasets
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny