logo
ResearchBunny Logo
Deep-learning-based image segmentation integrated with optical microscopy for automatically searching for two-dimensional materials

Engineering and Technology

Deep-learning-based image segmentation integrated with optical microscopy for automatically searching for two-dimensional materials

S. Masubuchi, E. Watanabe, et al.

This innovative research conducted by Satoru Masubuchi and colleagues showcases a deep-learning-based image segmentation algorithm that integrates seamlessly with an autonomous robotic system, revolutionizing the automated search and cataloging of 2D materials. With the robust Mask-RCNN neural network and advanced microscopy, this technology promises to enhance efficiency in 2D material research like never before.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the challenge of automatically detecting and cataloging exfoliated two-dimensional (2D) materials on SiO₂/Si substrates using optical microscopy without manual parameter tuning. Conventional rule-based image processing relies on handcrafted features and requires expert retuning when microscopy conditions (illumination, color balance) change, which is time-consuming and can cause material degradation. Leveraging recent advances in deep learning for image understanding, the authors propose a Mask-RCNN-based instance segmentation approach trained on annotated optical microscope images of multiple 2D materials to achieve robust, generalizable detection and layer-thickness categorization. The purpose is to integrate this capability into a motorized microscope system to enable autonomous, high-throughput searching and database creation of 2D flakes, thereby accelerating research and assembly of van der Waals heterostructures. The importance lies in improving robustness, eliminating parameter tuning, and enabling real-time operation across varying optical conditions and setups.
Literature Review
The paper situates its work within deep-learning progress in object detection, semantic and instance segmentation, and image generation, highlighting the superiority of region-based approaches (e.g., Mask-RCNN) over fully convolutional methods given sufficient annotated data. Prior applications of image recognition in scientific domains include medical and biological imaging. In the 2D materials field, earlier autonomous systems used rule-based methods relying on handcrafted features (color contrast, edges, entropy) for flake detection, which are sensitive to imaging conditions and require frequent expert retuning. Previous data-driven approaches for classifying graphene thickness exist, but robust, general-purpose detection across materials and conditions remained challenging. Transfer learning from large datasets like MS-COCO is recognized as a way to achieve strong performance with comparatively small domain-specific datasets.
Methodology
System architecture: The system comprises (i) an autofocus optical microscope with a motorized XY stage, (ii) a custom C++/Python software pipeline with a client–server architecture to acquire images, send them to a GPU inference server, receive results, display them, and log to a database, and (iii) trained deep-learning models for detecting 2D materials (graphene, hBN, MoS₂, WTe₂). Inference on 1024×1024 px images runs in ~200 ms on an NVIDIA Tesla V100; with I/O overhead, ~1 fps is achieved. The system automatically scans substrates and records detected flakes’ positions, shapes, labels, and confidences. Model: Mask-RCNN with a ResNet-101 backbone extracts features, followed by an RPN and ROI Align to propose regions; heads perform classification and bounding-box regression; a mask branch outputs instance masks. Implemented using Keras/TensorFlow (Matterport Mask-RCNN codebase). Dataset and annotation: ~2100 optical microscope images of exfoliated graphene, hBN, MoS₂, and WTe₂ on SiO₂/Si were collected using the automated microscope. Images were manually annotated with a web tool (Labelbox), aided by a semi-automatic workflow: a preliminary model trained on ~80 graphene images generated draft labels which were then corrected by annotators, reducing labeling time to 20–30 s per image. The dataset includes material identity and layer-thickness classes: mono (1L), few (2–10L), thick (10–40L). Data were converted to COCO-format JSON and split 80/20 train/test. Training procedure: Transfer learning initialized all layers except heads with MS-COCO pretrained weights; remaining weights were randomly initialized. Optimization used SGD with momentum 0.9, weight decay 1e-4. Training consisted of 4 stages × 30 epochs (each epoch = 500 steps): stage 1 trained heads only (lr=1e-3), stage 2 unfroze backbone from stage 4 (lr=1e-3), stages 3–4 trained the full network with lr=1e-4 then 1e-5. Total training time was ~12 h on four Tesla V100 (32 GB). Data augmentation online included color channel scaling, rotations, horizontal/vertical flips, and shifts. Images were resized to 1024×1024 with aspect-ratio preservation and zero padding. The loss combined classification, bounding-box regression, and mask losses per Mask-RCNN formulation. To promote generalization and efficient learning, a two-step strategy was used: (1) pretrain on a mixed dataset of all materials for segmentation and classification (material and thickness), then (2) transfer-learn on each material subset for refined layer-thickness classification using the epoch-120 mixed-material weights as the source. Evaluations and experiments: Inference examples demonstrated accurate detection and masking under contamination (tape residue, particles, corrugation) and robust layer-thickness categorization. Robustness to illumination changes was compared against a rule-based detector: deep learning maintained detections across substantial illumination variation, while the rule-based method failed with modest changes. A practical scanning test exfoliated WTe₂ on 1×1 cm² SiO₂/Si; using a 50× objective, the system scanned the substrate in ~1 h and found ~25 flakes. Performance metrics were computed by manual review of >2300 images with TP/FP/FN defined at the image level (at least one correctly detected flake per image). Transfer learning analyses compared initialization from MS-COCO only versus mixed 2D+COCO; learning curves and qualitative results showed faster convergence and lower test loss with 2D+COCO pretraining. Generalization tests used images from three different microscope setups (Asahikogaku AZ10-T/E, Keyence VHX-900, VHX-5000); despite differences in white balance, magnification, resolution, and illumination, the model trained on the original setup successfully detected graphene without additional retraining. Cross-material generalization was demonstrated by detecting WSe₂ and MoSe₂ using a model trained on WTe₂.
Key Findings
- Real-time inference: ~200 ms per 1024×1024 image on an NVIDIA Tesla V100; end-to-end ~1 fps including I/O. - Robustness: Deep learning detections remained stable across large illumination changes, unlike rule-based methods which failed under modest intensity decreases. - Automated search throughput: For WTe₂ on 1×1 cm² SiO₂/Si, ~25 flakes were found in ~1 hour using a 50× objective. - Detection performance (image-level metrics over >2300 images): - WTe₂: precision ≈ 0.53, recall ≈ 0.93; false negatives were predominantly small fractured flakes unusable for device assembly, implying low risk of missing usable flakes. - Graphene: precision ≈ 0.95, recall ≈ 0.97. - Segmentation accuracy: mAP@IoU50% ≈ 0.49 (graphene) and ≈ 0.52 (WTe₂) on the annotated dataset, sufficient for practical searches. - Transfer learning benefit: Pretraining on mixed 2D materials + MS-COCO yielded faster convergence and lower test loss than MS-COCO-only initialization; qualitative examples show improved detection and reduced misclassification. - Cross-setup generalization: Successful graphene detection on images from three different microscope systems without retraining. - Cross-material generalization: Models trained on WTe₂ detected WSe₂ and MoSe₂ flakes despite no explicit training on those materials. - Open resources: Source code, trained weights, training dataset, and microscope drivers are publicly available (https://github.com/tdmms/).
Discussion
The findings demonstrate that a Mask-RCNN-based segmentation approach can robustly and efficiently detect exfoliated 2D materials in optical images, overcoming the sensitivity and parameter-tuning burdens of rule-based methods. High recall, especially for WTe₂, indicates that the system reliably surfaces usable flakes for downstream robotic assembly, aligning with the goal of enabling autonomous, high-throughput workflows. Stability under varying illumination and generalization across different microscope setups underscores the model’s learned, instrument-agnostic features, turning it into a practical detector deployable across laboratories. Transfer learning from mixed 2D materials captures common flake features in the backbone, improving accuracy and training efficiency when adapting to specific materials. The client–server design further broadens accessibility by enabling cloud inference without local GPUs. Collectively, these results address the research question of creating a generalized, real-time, automated 2D-material search tool and highlight its relevance for accelerating fabrication of van der Waals heterostructures and reducing manual labor and delays that can degrade sensitive materials.
Conclusion
This work integrates a deep-learning instance segmentation model (Mask-RCNN) with a motorized optical microscope to automatically search, detect, segment, and catalog exfoliated 2D materials (graphene, hBN, MoS₂, WTe₂) on SiO₂/Si substrates in real time. The system achieves robust performance across illumination changes and different microscope setups, high recall for sparsely distributed flakes, and practical scanning throughput. Transfer learning across multiple 2D materials improves convergence and accuracy, and the approach generalizes to untrained materials (WSe₂, MoSe₂). The authors release code, trained weights, datasets, and drivers to facilitate adoption. Future directions include integrating automated thickness verification via computational post-processing (e.g., color contrast analysis), extending to additional 2D materials using the released pretrained weights (requiring as few as ~80 annotated images), enhancing precision for challenging materials, and scaling cloud-based deployment across diverse microscope platforms.
Limitations
- Precision for WTe₂ is moderate (~0.53) at the image level, implying non-trivial false positives that may require brief human screening. - Training data are limited to certain materials and substrates (primarily SiO₂/Si); performance on other substrates or imaging modalities may require adaptation. - Layer thickness categories are coarse (mono, few, thick) and rely on post hoc verification for precise thickness. - While generalization across microscope setups is shown for graphene, broader validation across more instruments/materials would strengthen claims. - Overfitting risks exist without augmentation; careful training practices and continued dataset growth are needed.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny