Chemistry
Targeted materials discovery using Bayesian algorithm execution
S. R. Chitturi, A. Ramdas, et al.
The study addresses the challenge of efficiently exploring large, discrete, multi-parameter design spaces in materials science to identify candidates meeting precise property specifications. Traditional Bayesian optimization targets single-objective maxima (or Pareto sets for multi-objective tasks), and mapping approaches like Uncertainty Sampling focus on learning the entire response surface, both of which can be misaligned with practical goals that require identifying arbitrary property-constrained subsets (subset estimation). The authors propose automatically creating custom, goal-aligned acquisition strategies from user-defined algorithms that specify target subsets. The work is motivated by limitations in experiment throughput and the need for short time-horizon decision making in materials systems where automation is limited or measurements are expensive. The core research question is how to translate arbitrary user-specified goals on multiple measured properties into effective, parameter-free sequential sampling policies that outperform generic acquisition functions.
The paper situates its contribution within sequential experimental design and Bayesian optimization. For single-objective optimization, classical acquisitions include UCB, PI, and EI. For multi-objective optimization, methods like EHVI, NEHVI, and ParEGO target Pareto fronts. Mapping tasks (full-function estimation) often use Uncertainty Sampling and have been applied in X-ray scattering and microscopy to efficiently resolve response surfaces. Level-set estimation has also been studied as a special case of subset identification. However, designing task-specific acquisition functions for arbitrary subset goals is challenging and limits accessibility. Prior materials-focused BO studies span chemistry, perovskites, catalysis, photonics, and process optimization. The authors note the gap in materials-oriented methods that directly target user-defined subsets beyond optimization/mapping, highlighting the need for a general framework that aligns acquisition with complex experimental goals.
Framework: Bayesian Algorithm Execution (BAX) for multi-property, discrete design spaces. Users specify an algorithm A that, if the true function f were known, would return the ground-truth target subset T_f of design points satisfying the goal. Because f is unknown, surrogate probabilistic models (independent Gaussian Processes per property) are trained on collected data to approximate f. The algorithm is executed on either the GP posterior mean or samples from the GP posterior to produce predicted target subsets used to build goal-aware acquisition functions.
Models: Independent single-property Gaussian Process (GP) surrogates with zero-mean prior and squared exponential (RBF) kernel. Posterior mean f_t and standard deviation σ_t are computed after each iteration. Posterior function samples are drawn from p(f|D_t) to represent plausible functions consistent with data. For multi-property settings, separate GPs are used and aggregates across properties are computed as averages.
Acquisition functions:
- MeanBAX: Execute the user algorithm on the GP posterior mean to obtain a predicted target set T. Acquisition equals the average GP marginal standard deviation across properties for x in T, and 0 elsewhere. Pathologies: if T is empty or T ⊆ D (all predicted targets already measured), fall back to Uncertainty Sampling over the entire domain.
- InfoBAX: Execute the user algorithm on n GP posterior function samples to obtain predicted target sets. For each sample, condition a GP on the union of real data and the algorithm’s predicted outputs (used only internally) and compute the average reduction in predictive entropy. Acquisition equals (average predictive entropy) minus (average entropy under the updated models reflecting algorithm outputs), averaged over properties and posterior samples. This selects points where uncertainty matters for the algorithm’s output.
- SwitchBAX: Dynamically switches between MeanBAX and InfoBAX. When MeanBAX’s default conditions are triggered (no predicted targets or all predicted targets already measured), switch to InfoBAX; otherwise use MeanBAX.
Pipeline:
- Define goal via algorithm A(f,X) (e.g., level bands, intersections, unions). 2) Fit multi-property GPs to D_t. 3) Execute A on GP posterior mean or samples to get predicted target subsets. 4) Build goal-aware acquisition from algorithm outputs. 5) Measure the design point with highest acquisition; iterate.
Metrics:
- Number Obtained: count of measured points that are ground-truth targets |D ∩ T|.
- Posterior Jaccard Index: set overlap between predicted and true target sets |T ∩ Uhat| / |T ∪ Uhat|, used for retrospective benchmarking.
Algorithms for goals:
- Library (nanoparticles): filter for polydispersity < 5% and radii in specified buckets [6.5, 10, 15, 17.5, 20, 30] ± 0.5 nm; union across buckets.
- Multiband (magnetics): intersection of Kerr rotation in [0.3, 0.4] mrad and coercivity in [2.0, 3.0] mT.
- Wishlist (magnetics): union of multiple multiband regions: [[2.0,3.0],[0.2,0.3]] or [[4.0,6.0],[0.2,0.4]] or [[9.0,10.0],[0.0,0.1]] or [[3.0,4.0],[0.7,0.8]].
Datasets and settings:
- Nanoparticle synthesis: 1997 discrete settings; properties are radius (y1) and polydispersity (y2) from polynomial models; measurement noise added (typical 1% and 5% normalized). 10 random initial points; 20 repeats; up to 300 acquisitions.
- Magnetic materials (Fe-Co-Ni ternary): 921 compositions; properties: Kerr rotation (mrad) and coercivity (mT). Up to 500 acquisitions.
GP details: RBF kernel k(x,x') = α_m exp(-0.5 (x-x')^T L^{-1} (x-x')), with diagonal lengthscale matrix L. Hyperparameters (lengthscales, kernel variance) fit via five-fold cross-validation; re-fit every 10 points. Likelihood variances σ_i^2 typically set to match assumed noise (e.g., 0.01), or to experimental noise in noise sweeps (0.0, 0.01, 0.05, 0.1). Design variables min-max scaled to (0,1); properties scaled to (-1,1). InfoBAX and SwitchBAX used 15 posterior samples. EHVI implemented via Trieste; no re-measurement of previously sampled points.
Baselines: Random Sampling (uniform without replacement), Uncertainty Sampling (average GP std across properties), and EHVI (for multi-objective BO targeting Pareto fronts).
- Across both nanoparticle and magnetic materials datasets, goal-aware BAX strategies (MeanBAX, InfoBAX, SwitchBAX) outperform goal-agnostic baselines (RS, US) and EHVI for the targeted subset estimation tasks when evaluated by Number Obtained. They also generally achieve higher Posterior Jaccard Index, especially on the magnetic datasets.
- MeanBAX vs InfoBAX: MeanBAX tends to perform best in the short term for Number Obtained (exploitative behavior driven by posterior mean), while InfoBAX achieves superior long-term performance and better coverage of the full target subset (more explorative by leveraging posterior samples and information gain).
- SwitchBAX: By dynamically switching between MeanBAX and InfoBAX, SwitchBAX performs robustly across both small-data and medium-data regimes, maintaining strong performance on both metrics and avoiding MeanBAX’s pathological cases.
- Nanoparticle Library task: BAX strategies significantly outperform RS and US in Number Obtained at both 1% and 5% noise. EHVI attains reasonable Posterior Jaccard Index but misses many target points with low polydispersity yet high size due to goal misalignment, leading to lower Number Obtained than BAX. Under higher noise (5%), all methods slow down in obtaining targets; MeanBAX exhibits higher variance across initializations, whereas InfoBAX and SwitchBAX remain relatively robust.
- Magnetic Multiband and Wishlist tasks: BAX methods outperform RS, US, and EHVI in Number Obtained and show notably higher Posterior Jaccard Index than US. EHVI performs poorly for disjoint target regions due to misalignment with Pareto optimization goals. The Wishlist task, with disjoint targets in design space, remains challenging; BAX still yields better predictive overlap and targeted sampling.
- Supplementary comparison on Pareto front optimization indicates BAX can perform comparably to specialized EHVI on that specific goal for the given dataset, underscoring versatility.
The findings demonstrate that aligning acquisition functions with user-defined goals via algorithm execution can markedly improve experimental efficiency for subset estimation tasks in materials discovery. MeanBAX’s exploitative nature quickly accrues target points when the GP posterior mean approximates the true function well, benefiting short experimental budgets and low-automation settings. InfoBAX’s exploration, driven by posterior sample diversity and information-theoretic selection, more effectively reduces uncertainty where it affects the algorithm’s outputs, providing higher long-term coverage and robustness to noise or model underfit. SwitchBAX adapts between these behaviors to avoid MeanBAX failure modes and maintain strong performance throughout data collection. Compared to US and RS, which spread effort across the entire domain, BAX focuses model accuracy and data acquisition where it matters for the specified goal, yielding higher overlap with true target sets and more successful experiments. EHVI, while effective for Pareto front tasks, can be misaligned when goals are disjoint or not optimization-centric, reducing efficiency relative to BAX. Overall, embedding the experimental objective directly into the acquisition process yields more informative measurements and accelerates discovery.
This work introduces a practical, multi-property Bayesian Algorithm Execution framework that converts simple user-defined algorithms describing experimental goals into parameter-free, goal-aware acquisition strategies (MeanBAX, InfoBAX, SwitchBAX). Tailored to discrete design spaces and short time-horizon decisions, the methods outperform standard baselines for targeted subset estimation across nanoparticle synthesis and magnetic materials datasets. MeanBAX excels early, InfoBAX in the long run, and SwitchBAX robustly across regimes. By allowing scientists to specify complex goals via concise algorithms, the approach removes the burden of custom acquisition design and enables efficient, targeted exploration. Potential future directions include extending to additional surrogate models and continuous design spaces, adaptive handling of unknown noise levels, richer multi-property dependencies beyond independent GPs, and further validation in closed-loop laboratory settings for diverse materials systems.
- The framework and demonstrations focus on discrete, fully enumerated design spaces; extensions to continuous or mixed-variable spaces may require additional adaptations.
- Independent GP models are used per property, not modeling cross-property correlations; joint multi-output GPs could improve performance where properties are correlated.
- Likelihood noise levels are assumed known for some experiments (or set a priori); while the authors note these can be fit, mismatches may impact performance.
- Posterior Jaccard Index requires knowledge of the ground-truth target subset and is thus only applicable for retrospective benchmarking on known datasets.
- MeanBAX can exhibit pathological behavior when no targets are predicted or predicted targets are already measured (mitigated via SwitchBAX).
Related Publications
Explore these studies to deepen your understanding of the subject.

