Engineering and Technology
AIPHAD, an active learning web application for visual understanding of phase diagrams
R. Tamura, H. Morito, et al.
Phase diagrams map phases and their transformations as functions of thermodynamic variables such as temperature, pressure, and composition, and are central to materials science and condensed-matter physics. Constructing multidimensional phase diagrams is resource-intensive due to the large experimental or simulation search spaces. Data-driven and machine learning approaches offer a route to accelerate phase diagram determination by leveraging existing data to predict regions and boundaries, thereby reducing the number of required experiments. This work introduces an active-learning-based, visualization-focused framework and tools (AIPHAD) to efficiently determine and understand complex phase diagrams, addressing the need for rapid, informed exploration in multicomponent systems.
Data-driven methods have been increasingly applied to phase diagrams and related materials problems. Prior work includes machine learning predictions of phase formation in high-entropy alloys, quasicrystal stability, coexisting phases in ternary sections, and binary phase boundaries. In condensed-matter physics, data-driven analysis has been used to map simulation-based phase diagrams for strongly correlated fermions and topological systems. Active learning strategies have been proposed to efficiently sample phase space and to integrate thermodynamic constraints (e.g., Gibbs phase rule) to improve efficiency. These advances motivate combining semi-supervised learning with uncertainty sampling to reduce experiments while maintaining high fidelity in phase boundary determination.
The AIPHAD toolbox implements the Phase Diagram Construction (PDC) algorithm using active learning with semi-supervised learning for phase estimation and uncertainty sampling to propose informative experiments.
Initial setting and discretization:
- Define the phase diagram search space and discretize each dimension into N candidate points X = {x_i} in d dimensions.
- Prepare an initial labeled dataset of M points with phase category labels from L = {1,...,C}. Categories include single phases and coexisting-phase regions, treated uniformly as distinct labels. Unlabeled points are the remaining candidates.
Phase estimation using semi-supervised learning:
- Label Propagation (LP): Build a fully connected graph over all points with RBF kernel edge weights w_ij = exp(−gamma ||x_i − x_j||^2) (default gamma = 20 as in scikit-learn). Define a column-stochastic transition matrix T by normalizing weights per column. Initialize a probability matrix P (N by C): labeled points are one-hot at their given labels; unlabeled are initialized to zero. Iterate: (i) propagate P <- T^T P; (ii) normalize probability vectors for unlabeled points to sum to 1 and reset labeled points to their one-hot labels; repeat until convergence. Final p_i vectors give class membership probabilities per point, preserving original labels for labeled points.
- Label Spreading (LS): Similar to LP but sets w_ii = 0 and allows labeled points to adjust to accommodate label noise. With initial P0, iterate P <- alpha T P + (1 − alpha) P0 with 0 < alpha < 1 (default alpha = 0.2), normalizing each row to sum to 1 until convergence. The maximum-probability label per point is the predicted phase.
Uncertainty sampling and proposal generation:
- Compute uncertainty scores from probability vectors p_i for each unlabeled point. Three commonly used scores are supported: Least Confidence (higher uncertainty when the max class probability is low), Margin Sampling (higher uncertainty when the gap between the top two class probabilities is small), and Entropy (higher uncertainty with more uniform probability distributions). The most uncertain point x* maximizes the chosen score and is proposed for experiment. For batch proposals, two strategies are available: (i) Only Uncertainty ranking (select top-K by uncertainty), and (ii) Neighbor Exclusion (rank by uncertainty while excluding neighbors within a specified k-nearest-neighbor radius to diversify proposals).
Thermodynamic considerations:
- The workflow can incorporate thermodynamic knowledge to accelerate exploration: (1) leverage coexisting-phase information (e.g., tie-lines/triangles) to generate many labeled points from a single experiment, and (2) apply the Gibbs phase rule to exclude regions (e.g., interiors of three-phase triangles in ternary systems) from further search, reducing candidates.
Software usage:
- Web application: Users define search space, input labeled phase information, select estimation (LP or LS) and sampling method (LC, MS, EA, or random), set the number of proposals and optional neighbor exclusion, run calculations to view proposed points, uncertainty maps, and estimated diagrams, and inspect per-phase probability rankings for unlabeled points.
- Python package: Install via pip. Use pdc_sampler(estimation, sampling, proposal) with arrays X (all candidates) and y (labels; -1 for unlabeled). Call fit(X, y) to estimate probabilities and us() for uncertainty sampling; retrieve proposed indices, coordinates, and uncertainty scores. Batch suggestions can use multi_method = "OU" or "NE" with NE_k; hyperparameters gamma and alpha are configurable. The package also exposes unlabeled indices, uncertainty scores, and per-label probability distributions for targeted searches.
- Demonstration system: Fe–Ti–Sn ternary system targeting the Fe2TiSn Heusler phase (thermoelectric relevance).
- Isothermal section at 900 °C: The ternary section was discretized into 231 composition points (5% increments). Seven initial experiments (including a Heusler-stoichiometric composition) identified four phase regions: Ti-rich, Sn-rich, Fe-rich, and Heusler. Using LS with Least Confidence uncertainty, AIPHAD proposed experiments near predicted boundaries over multiple closed-loop cycles, refining phase boundaries. Under the selected conditions, no new phases were detected beyond the four initial regions, but boundary delineation improved with few added experiments.
- Ternary phase diagram vs temperature: A 3D prism diagram over 700–1000 °C (initially 100 °C steps; compositions in 5% steps) was explored using LS+LC. An initial 14 proposals (Only Uncertainty ranking) led to experiments at 700 and 800 °C that revealed three additional regions: FeSn, FeSn2, and a mixed Fe + Ti region (unreacted). Heusler did not form at 700 °C (likely insufficient heat-treatment time). The study then focused on 800–1000 °C with 50 °C steps and proposed 14 more experiments; while no further new phases were found, the existing boundaries were clarified. A comprehensive metastable phase diagram was constructed from all data.
- Targeted phase search: Using AIPHAD’s probability-based targeting along the 900 °C isothermal section, six candidates with high Heusler probability were proposed. Experiments confirmed the Heusler phase at four of these six points, delineating the Heusler-stable region efficiently.
- Overall: AIPHAD reduced experimental load, provided clear uncertainty maps and boundary predictions (with LS yielding sharper boundary emphasis than LP), and enabled efficient targeted discovery of a specific phase region in a multicomponent system.
The study addresses the challenge of resource-intensive phase diagram determination by implementing an active-learning loop that iteratively estimates phase regions, quantifies uncertainty, and proposes informative experiments. Semi-supervised learning (LP/LS) capitalizes on sparse labeled data to infer probabilities for unlabeled points across discretized composition–temperature spaces. Uncertainty sampling directs experiments to high-information regions, particularly near phase boundaries and invariant features crucial for ternary diagrams. Incorporating thermodynamic constraints further reduces the search space. Applied to Fe–Ti–Sn, AIPHAD rapidly refined metastable phase boundaries, discovered additional low-temperature phases (FeSn, FeSn2, Fe + Ti) during the extended temperature study, and efficiently localized the Heusler-stable region (4/6 targeted confirmations). The LS method provided more consistent identification of uncertain boundaries than LP, supporting its selection for ternary construction. These findings validate that the PDC algorithm within AIPHAD can construct useful phase diagrams from minimal prior data and guide targeted exploration, complementing traditional materials workflows and enabling integration with autonomous experimentation platforms.
AIPHAD, comprising a web application and Python package implementing the PDC algorithm with label propagation/spreading and uncertainty sampling, enables efficient phase diagram construction and visualization with reduced experimental effort. In the Fe–Ti–Sn ternary case, AIPHAD started from sparse initial data, refined phase boundaries at 900 °C, identified additional phases at lower temperatures, and efficiently delineated the Heusler-stable region via targeted proposals. The tool is accessible, open-source, and integrates with NIMS-OS for closed-loop autonomous experimentation. Future directions include broader applications to equilibrium and metastable diagrams across materials classes and deeper integration with thermodynamic modeling (e.g., CALPHAD) to further enhance efficiency and predictive accuracy.
The experimental phase diagrams constructed in this study are metastable because short heat-treatment durations were used; equilibrium states at each point are not guaranteed, and the resulting diagram shapes differ from reported equilibrium diagrams. At 700 °C, the Heusler phase did not form, likely due to insufficient treatment time, underscoring time-dependence as a constraint. While label spreading can improve robustness to noisy labels, it may alter initial labels; careful interpretation is needed. Results also depend on discretization choices and hyperparameters (e.g., gamma, alpha) and on the availability and accuracy of initial labeled data.
Related Publications
Explore these studies to deepen your understanding of the subject.

