logo
ResearchBunny Logo
Creation of crystal structure reproducing X-ray diffraction pattern without using database

Chemistry

Creation of crystal structure reproducing X-ray diffraction pattern without using database

J. Lee, J. Oba, et al.

Discover Evolv&Morph, an innovative approach to creating crystal structures that can reproduce X-ray diffraction patterns without databases. Developed by Joohwi Lee, Junpei Oba, Nobuko Ohba, and Seiji Kajita from Toyota Central R&D Labs, this automated method combines evolutionary algorithms and Bayesian optimization for unmatched accuracy in material design.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the challenge of determining crystal structures directly from measured X-ray diffraction (XRD) patterns when no similar patterns are found in databases. Conventional workflows rely on large databases (e.g., PDF, ICSD) and database-driven identification or machine-learning models trained on them, which can fail for novel or complex materials with unknown structures. Rietveld refinement can adjust candidate structures toward experimental patterns but depends strongly on an initial structure and expert tuning, limiting success when the starting guess is poor. The research goal is to develop a database-independent inverse design method that automatically creates crystal structures whose simulated XRD reproduces a target pattern. The proposed approach, Evolv&Morph, combines an evolutionary algorithm and crystal morphing guided by Bayesian optimization to maximize similarity between simulated and target XRD, enabling automated structure creation and expanding applicability to unknown structures.
Literature Review
The authors review the prevalent use of material databases (PDF by ICDD and ICSD) for XRD-based phase identification and note that XRD patterns can be simulated from structures stored in these databases. Recent advances include machine learning models that classify crystal systems and space groups from XRD, prototype-based searches aided by first-principles calculations to solve structures absent from databases, and composition-to-XRD prediction via deep learning (e.g., DeepXRD). Rietveld refinement is commonly used to reduce discrepancies between measured and simulated patterns, and the BBO-Rietveld approach automates parameter optimization to increase refinement success. However, these methods remain dependent on initial database candidates, which limits applicability when the measured XRD corresponds to an unknown structure. The authors motivate inverse design approaches for materials, aiming to generate structures directly from target properties (here, XRD similarity).
Methodology
Overview: Evolv&Morph combines an evolutionary algorithm to generate diverse candidate crystal structures with crystal morphing, guided by Bayesian optimization, to interpolate between promising candidates and further increase XRD similarity to a target pattern. Optional refinement (Rietveld and symmetrization) can further tune the best structures. Evolutionary algorithm: Implemented with USPEX. The first generation is created with randomly selected space groups (numbers 3–230). Minimum bond lengths during random generation: 1.95 Å (same elements) and 1.5 Å (different elements). From the second generation onward, new structures are produced by genetic operators with proportions: crossover 50%, random symmetry creation 20%, mutation 30% (permutation 10% to exchange occupied sites; softmutation 20% to move atoms along soft-mode eigenvectors). Each generation has 50 structures. Selection retains high-scoring survivors and removes low-scoring candidates, using the cosine similarity of XRD patterns, S_cos, as the fitness. Termination occurs if the best-ranked structure does not change for 10 generations or upon reaching generation 20. Thermodynamic stability aids are provided by structural relaxations to avoid unphysical structures. First-principles calculations: Each generated structure is relaxed using VASP with the PAW method and PBEsol exchange-correlation. Computational settings prioritize efficiency: plane-wave cutoff 300 eV; k-point spacing of 0.12 (in units of 2π/Å). Ionic relaxation proceeds until forces on atoms are below 0.03 eV/Å or 30 ionic iterations are completed. To avoid time loss on unphysical cases, unfinished calculations are wall-timed out at 10 minutes using 8 parallel cores. Crystal morphing with Bayesian optimization: Crystal morphing interpolates between two input structures (I and II) to create intermediate structures at specified distances defined by the SOAP descriptor metric, which is invariant to translation, rotation, and unit-cell choice. SOAP parameters: Gaussian width σ = 0.5 Å; maximum radial basis size 10; maximum spherical harmonics order 6. The squared SOAP distance d^2(x_i, x_ii) is computed from differences in SOAP power spectra. Interpolation adjusts reciprocal lattice vectors and internal atomic coordinates via steepest descent with step size 0.02 and up to 15 iterations. For multi-element systems, elements are distinguished by sign in the real-space density (two-element case) or by decomposing into pairwise systems and summing distances (three-element case). Two search strategies are used to maximize S_cos along morphing paths: (1) Greedy optimization starts from S_cos champions taken from five independent EA trials; the top two are morphed and, if a higher-S_cos intermediate is found, it is added while the inputs are removed; otherwise, the second-best input is dropped and the process repeats. (2) All-pairs investigation explores morphing between all pairs in the input list to expand the search space. One cycle of greedy followed by all-pairs is used. Bayesian optimization (GPyOpt) guides sampling along each morphing line: initial evaluations at 0%, 25%, 50%, 75%, 100% distances and two random points, followed by four additional iterations with four-point parallel batches. Refinement: The best created structures can be further tuned via Rietveld refinement to reduce residual differences between simulated and target XRD patterns and by symmetrization to enforce a consistent space group (sensitive to tolerance). Auxiliary tools referenced include USPEX (main), BBO-Rietveld, GSAS-II, GPyOpt, SPGLIB, and PHONOPY.
Key Findings
- Evolv&Morph successfully created crystal structures reproducing target XRD patterns across 16 material systems: 12 with simulated target patterns and 4 with experimental powder XRD targets. - Achieved cosine similarity S_cos ≈ 99% for the 12 simulated targets and >96% for the 4 experimentally measured powder XRD targets (after background removal). - The approach is automated and does not rely on crystal structure databases, enabling identification when no similar database pattern exists. - Crystal morphing expanded the search space beyond standard evolutionary generation, improving best S_cos values and providing better inputs for post-refinement. - The framework demonstrates potential for inverse design, where the optimization target can be replaced with other functional property scores beyond XRD similarity.
Discussion
The findings demonstrate that database-independent generation of crystal structures that reproduce a given XRD pattern is feasible. By maximizing S_cos between simulated and target XRD patterns, the evolutionary algorithm rapidly explores diverse structures, while crystal morphing interpolates between high-scoring candidates to discover improved intermediates that may be inaccessible by direct genetic operations. The combination addresses the principal challenge of structure determination from novel XRD data when database matches are absent and Rietveld refinement lacks a suitable starting model. High S_cos scores (~99% for simulated targets and >96% for experimental powder data) indicate that the produced structures capture the essential crystallographic features of the targets, thereby enabling subsequent refinement steps. The approach’s significance lies in enabling automated, expert-light workflows for structure solution and in serving as a general inverse design framework. By substituting the XRD similarity metric with other property targets, Evolv&Morph can be adapted to search for materials with desired functionalities, expanding its relevance beyond structure determination. The discussion also notes computational costs from first-principles relaxations as the dominant time bottleneck, suggesting that faster and sufficiently accurate ML interatomic potentials could substantially increase scalability and applicability to larger, more complex systems.
Conclusion
The study introduces Evolv&Morph, a database-independent inverse design workflow that combines an evolutionary algorithm with crystal morphing guided by Bayesian optimization to generate crystal structures whose simulated XRD closely matches a target pattern. Across 16 systems, the method achieves ~99% S_cos for simulated targets and >96% for experimental powder data (post background removal), demonstrating robust performance without reliance on initial database structures. Crystal morphing effectively broadens the search space and enhances best candidates for subsequent refinement. The framework also generalizes to materials design problems by exchanging the XRD similarity objective for other property targets. Future directions include replacing or augmenting first-principles relaxations with reliable machine-learning interatomic potentials to reduce computational cost, iterating morphing cycles for broader coverage, and applying the method to larger and more complex multicomponent systems with in situ integration of automated refinement and symmetry analysis.
Limitations
- Computational cost: First-principles (VASP) relaxations dominate runtime, limiting scalability to larger systems; faster, accurate ML interatomic potentials are proposed as a remedy. - Search breadth: The greedy morphing strategy can restrict exploration if intermediates underperform; the all-pairs strategy mitigates this but the study used only one cycle, potentially limiting coverage. - Experimental preprocessing: Achieved S_cos for experimental powder data (>96%) required background removal; performance may depend on data quality and preprocessing. - Termination criteria and modest generation counts (max 20 generations) may miss rare solutions in extremely complex search spaces. - Refinement and symmetry assignment can be sensitive to tolerance choices, and final accuracy may still benefit from expert validation in challenging cases.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny