
Chemistry
An integrated high-throughput robotic platform and active learning approach for accelerated discovery of optimal electrolyte formulations
J. Noh, H. A. Doan, et al.
Discover the innovative workflow developed by Juran Noh and colleagues that combines high-throughput experimental techniques with intelligent algorithms to revolutionize electrolyte formulation for redox flow batteries. This groundbreaking research showcases the identification of solvents surpassing a 6.20 M solubility threshold, paving the way for more efficient energy storage solutions.
~3 min • Beginner • English
Introduction
Designing materials with targeted properties is central to clean-energy technologies, but conventional trial-and-error development is slow and costly. Data-driven methods can accelerate discovery, yet practical impact in materials research is limited by a lack of large, high-fidelity experimental datasets. Redox flow batteries (RFBs), particularly nonaqueous systems (NRFBs), are promising for grid-scale, long-duration energy storage because they offer wide operational voltage windows and potentially higher energy density. A key bottleneck is the solubility of redox-active organic molecules (ROMs), which limits achievable concentration and thus energy density. Generating standardized, application-relevant solubility data for ROMs in organic solvents is challenging due to dependence on solute/solvent identity, composition, temperature, and equilibration protocols. Traditional solubility methods include the ‘excess solvent’ (fast but kinetic) and ‘excess solute’ (accurate but slow) approaches, both of which pose throughput challenges for building robust datasets. To address these issues, the authors propose a closed-loop workflow combining high-throughput experimentation (HTE) and active learning via Bayesian optimization (BO) to efficiently generate reliable solubility data and rapidly identify solvent systems with high solubility for a model ROM, 2,1,3-benzothiadiazole (BTZ).
Literature Review
Prior work highlights the importance of materials informatics and automation for accelerating energy materials discovery, but emphasizes the need for standardized, high-fidelity datasets. Solubility measurement practices vary significantly; the ‘excess solvent’ approach enables automation but captures kinetic solubility, whereas the ‘excess solute’ (shake-flask) method yields thermodynamic solubility with higher accuracy at the cost of time and instrumentation. Existing HTE methods largely target aqueous systems, complicating transfer to nonaqueous solvents where binary mixtures often enhance solubility via synergistic effects. Bayesian optimization and active learning have been effective in guiding experimental design in chemistry and battery electrolytes. For NRFBs, solubility and other solution properties critically impact performance, underscoring the need for multi-property optimization in future studies. The authors build on this body of work by integrating an automated HTE platform with BO to target BTZ solubility in unary and binary organic solvents, leveraging physics-based and quantum-chemistry-informed descriptors to improve ML predictions.
Methodology
System and closed-loop workflow: The platform integrates a high-throughput experimentation (HTE) module with a Bayesian optimization (BO) module. HTE automates sample preparation, equilibration, and quantitative NMR (qNMR) analysis of saturated solutions. BO trains a surrogate model on measured data, ranks candidate solvents via an acquisition function, and iteratively selects top candidates for experimental evaluation.
Solvent and candidate space: The study curated 22 single organic solvents spanning a range of physicochemical properties (e.g., dielectric constants, boiling points, densities). Binary mixtures were enumerated from these solvents across multiple composition ratios, yielding a candidate space of 2101 binary formulations. Initial training data comprised all 22 single solvents and 36 randomly selected 1:1 binary mixtures; an additional 40 binaries were later measured for testing. Overall, 218 measurements (<10% of candidates) were executed in the active-learning campaign.
Automated preparation of saturated solutions (excess solute): A Big Kahuna (Unchained Labs) robotic platform, programmed via Library Studio, prepared up to 40 formulations plus two controls (2.0 M and saturated BTZ in acetonitrile) per 48-well microplate. All dispensing occurred in an argon-filled glovebox. After powder and solvent dispensing, vials were capped, vortexed at 1000 RPM, and stirred at 500 RPM for 1–3 h to ensure excess solid. Samples equilibrated on-deck at 20 °C for 8 h. An on-line vision system verified undissolved solids. After equilibration, supernatants were used for analysis.
Quantitative 1H NMR solubility measurement: The workflow used 1,4-dinitrobenzene (DNB) as internal standard in DMSO-d6. Automation transferred 30 µL of each equilibrated supernatant to NMR tubes, followed by 10 µL of internal-standard solution; tubes were capped and mixed. Spectra were acquired on a Bruker 400 MHz instrument with autosampler. Concentrations were quantified by integrating BTZ and DNB peaks relative to known internal-standard concentration and volumes. Two ACN standards (1.0 and 2.0 M) validated accuracy (measured 0.98 and 1.98 M). Batch reproducibility was monitored using control samples (2 M and saturated BTZ in ACN) in every run.
High-throughput viscosity: For selected saturated solutions, 100 µL aliquots were analyzed using a VROC initium one plus viscometer. Viscosities of saturated solutions were below ~2.5 cP and showed minimal sensitivity to BTZ concentration within the tested range.
Feature engineering for ML: To predict BTZ solubility in unary and binary solvents, 11 descriptors were assembled, including solvent-level physicochemical properties (e.g., molecular weight, topological polar surface area, heavy-atom count, logP) and solute-related quantum-chemical properties computed for solvated BTZ (e.g., solvation free energy, dipole moment, polarizability, HOMO/LUMO energies, max/min partial charges). For binary mixtures, descriptor values were computed as molar-fraction-weighted combinations of constituent solvent descriptors.
Quantum chemistry calculations: Density functional theory (DFT) calculations were performed with Gaussian 16 at the 3BLYP/6-31+G(d,p) level. Solvation effects for BTZ in each of the 22 unary solvents were modeled using PCM (SCRF), and solvation free energies were obtained from differences between solvated and gas-phase Gibbs free energies.
Surrogate model and BO strategy: A Gaussian Process Regression (GPR) surrogate with a Matérn (ν=1.5) kernel was trained on measured solubilities. Hyperparameters (length scale, noise) were optimized via maximum likelihood. Uncertainty-aware predictions were fed to an Expected Improvement (EI) acquisition function (ε=1e−3) to balance exploration and exploitation. In each BO iteration, top-ranked candidates were selected for HTE measurement; new data were appended to the training set and the model retrained, closing the loop.
Benchmarking and evaluation: The team benchmarked BO against random selection using a 98-solvent subset to assess sample efficiency in finding the highest-solubility candidate. Model performance on the training/test sets was evaluated using R², RMSE, and mean error (ME).
Key Findings
- The integrated ML-guided HTE platform rapidly identified binary solvent formulations with high solubility of BTZ while experimentally assessing fewer than 10% of candidates (218 measurements out of 2101 formulations).
- Multiple solvent mixtures surpassed a solubility threshold of 6.20 M; in total, 18 binary systems exceeded this value.
- 1,4-Dioxane (DOX) plays a central role: it was the best single solvent for BTZ (5.47 M) and appeared frequently in top-performing binary mixtures.
- Bayesian optimization consistently outperformed random selection in the benchmarking task for identifying the highest-solubility solvent in a 98-solvent dataset, requiring fewer experimental evaluations on average.
- The GPR surrogate achieved reasonable predictive accuracy on measured data (R² ≈ 0.81, RMSE ≈ 0.48 M, ME ≈ 0.29 M), enabling effective BO-driven selection.
- A top-performing composition identified by BO was DOX:DMSO at 0.80:0.20 with BTZ solubility of about 6.50 M, substantially outperforming the best mixture in the initial set (e.g., DOX:DMF 0.60:0.40 at ~2.65 M).
- Synergistic mixing effects were pronounced: for the top binaries, solubility exceeded that of both pure components. Notably, DOX combined with glutaronitrile (GTN; a low-solubility single solvent at ~1.86 M) yielded an unexpectedly high BTZ solubility of approximately 6.84 M.
- High-throughput viscosity measurements on saturated solutions indicated low viscosities (< ~2.5 cP), favorable for electrolyte handling.
Discussion
The study demonstrates that combining automated, accurate thermodynamic solubility measurements (via an excess-solute, qNMR-based HTE workflow) with an uncertainty-aware active-learning algorithm can dramatically improve data efficiency and accelerate discovery of high-solubility electrolyte formulations. The approach successfully exploits non-intuitive synergistic effects in binary solvents—particularly those involving 1,4-dioxane—yielding many mixtures that outperform their pure constituents. The GPR surrogate, informed by physicochemical and quantum-chemical descriptors, attains sufficient fidelity to guide Bayesian optimization, which substantially reduces the number of experiments required to find top candidates compared to random exploration. These findings directly address the bottleneck of limited, standardized solubility data in NRFB research and provide a template for closed-loop, data-driven screening of electrolytes. Beyond solubility, the framework is extensible to multi-property optimization (e.g., viscosity, ionic conductivity, chemical stability) and to more complex mixtures (e.g., supporting salts and additives) that better reflect practical electrolyte formulations.
Conclusion
An integrated ML-guided high-throughput robotic platform was developed for accelerated discovery of optimal electrolyte formulations, exemplified by identifying many binary solvent systems with BTZ solubility above 6.20 M while testing fewer than 10% of candidates. The strategy combines reliable, standardized thermodynamic solubility measurements with a Gaussian-process-based Bayesian optimization loop that efficiently prioritizes experiments. The work also yields a curated solubility database across diverse organic solvents, supporting further model development. Future directions include expanding to multicomponent systems (e.g., inclusion of salts and other organic species) and optimizing multiple performance-relevant properties such as viscosity, conductivity, and chemical stability to more comprehensively design high-performance NRFB electrolytes.
Limitations
- Scope limited primarily to unary and binary solvent systems; practical NRFB electrolytes are multicomponent (e.g., include supporting salts and other additives), which may alter solubility and other properties.
- The model and optimization focus on solubility; other critical electrolyte properties (viscosity, ionic conductivity, chemical stability) were not jointly optimized and may impose trade-offs.
- Solubility is temperature- and protocol-dependent; while the workflow standardizes equilibration and measurement at 20 °C with qNMR, comparability to literature data can vary, underscoring the need for rigorous reporting of conditions.
- Descriptor construction for binary mixtures uses molar-fraction-weighted combinations, which may not fully capture non-ideal interactions in all systems.
- Although automated, the excess-solute equilibrium approach requires hours of equilibration, limiting absolute throughput relative to purely kinetic screening methods.
Related Publications
Explore these studies to deepen your understanding of the subject.