Engineering and Technology

Laying the experimental foundation for corrosion inhibitor discovery through machine learning

C. Özkan, L. Sahlmann, et al.

Discover groundbreaking research by Can Özkan and colleagues on innovative coatings that promise long-lasting corrosion protection using machine learning and an extensive electrochemical library of inhibitor candidates. Uncover how this research paves the way for faster inhibitor discovery.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses the critical need for chromate-free corrosion inhibitors for aerospace aluminum alloys, focusing on AA2024-T3. It situates corrosion inhibition research within the shift to data-driven discovery, noting advances in mechanistic understanding, high-throughput screening, and computational modeling (FEM, DFT/MD). A major barrier to machine learning in corrosion is the lack of high-quality, systematically shared, time-dependent multidimensional datasets. High-throughput methods often yield single-parameter outputs and limited mechanistic insight. The research question is how to build a robust, multidimensional, time-resolved electrochemical dataset and apply best practices to train predictive ML models for corrosion inhibitor discovery. The purpose is to produce comprehensive electrochemical data covering open-circuit and biased conditions over 24 h, quantify performance with suitable metrics, and demonstrate how experimental features (e.g., pH, electrochemical potentials) augment structural and DFT descriptors in QSPR models. This is important for accelerating inhibitor discovery with mechanistic fidelity and improved predictive power.

Literature Review

The paper reviews two complementary research streams: mechanistic studies that elucidate inhibition mechanisms on AA2024-T3 for inorganic (chromates, rare earths, molybdate, cobalt ions, Mg-based pigments, lithium salts) and diverse organic inhibitors (imidazoles, triazoles/thiazoles, quinolines, carbamates, thiosemicarbazones), including the roles of exposure time and irreversibility. High-throughput screening methods (optical assays, fluorometry, multi-electrode electrochemistry, surface Cu enrichment, hydrogen evolution, weight-loss, multi-channel spectroscopy) enabled rapid data generation but often sacrifice mechanistic detail. Computational modeling advances (FEM for strain, pit repassivation pH, geometry effects; DFT/MD for electronic structure and adsorption mechanisms) increasingly integrate with experiments. Data-driven QSPR studies have combined DFT-derived molecular descriptors and experimental efficiencies to classify/predict inhibitors for Al and Mg alloys, with mixed findings on the predictive value of in vacuo DFT descriptors alone. A recent review highlights the lack of high-quality, time-dependent, multidimensional corrosion datasets and heterogeneity in existing databases (e.g., CORDATA), underscoring the need for standardized, rich datasets to enable reliable ML models.

Methodology

Substrate and preparation: AA2024-T3 sheets (2 mm) were cut into 20 mm × 20 mm samples. Surfaces were ground under water on SiC papers (320, 800, 1200, 2000, 4000 grit), then polished with 3 µm and 1 µm diamond suspensions to a mirror finish. Samples were ultrasonically cleaned in isopropanol for 15 min and dried. Electrolytes and inhibitors: Base electrolyte was 0.1 M NaCl (pH ~5.9) prepared in Milli-Q water. Inhibitor solutions contained 1 mM of each candidate without pH adjustment or added solubilizers. A total of 78 small organic molecules (aromatic/aliphatic; thiol, amino, carboxyl, hydroxyl functionalities) were tested. Most dissolved at 1 mM; several showed poor solubility (e.g., thiosalicylic acid, 2-mercaptobenzothiazole, α-benzoin oxime, 2,2′-dithiodibenzoic acid, 4-mercaptobenzoic acid, 2-(2-hydroxyphenyl)benzothiazole, quercetin hydrate, berberine chloride hydrate, 2-(2-hydroxyphenyl)benzoxazole). Bulk pH was measured before and after electrochemical testing. Electrochemical setup: A three-electrode flat corrosion cell (300 mL) was used with AA2024-T3 as working electrode (exposed area 0.785 cm², 1 cm diameter), Pt mesh counter, and Ag/AgCl (sat. KCl) reference. Measurements were controlled by Biologic VSP-300 potentiostats (EC-Lab v11.33) at room temperature and open to air. Measurement protocol over 24 h: After 10 min OCP stabilization, linear polarization resistance (LPR) scans of ±10 mV at 0.5 mV s⁻¹ were performed every 10 min for 24 h. Rp was obtained by linear fitting in the near-OCP region. Electrochemical impedance spectroscopy (EIS) was performed at ~2 h and 24 h using a 10 mV AC amplitude, frequency range 10 kHz to 10 mHz, 10 points/decade, with 3 repetitions per frequency point. After the 24 h EIS, potentiodynamic polarization (PDP) was recorded from −250 mV to +250 mV vs OCP at 0.5 mV s⁻¹ in a single sweep. Corrosion potential (Ecorr) and current density (jcorr) were estimated by Tafel extrapolation of linear regions of anodic/cathodic branches. The low-frequency EIS modulus at 10⁻² Hz (|Z|10⁻² Hz) was treated as an estimate of corrosion resistance, acknowledging contributions from film resistance, charge transfer, and diffusion. Time-weighted corrosion resistance: To capture time evolution, a time-weighted average Rp was computed via trapezoidal integration over the 24 h LPR series, yielding a single aggregate descriptor (denoted (Rp)). Performance metrics: Traditional inhibition efficiency (IE) was computed from Rp (or jcorr) by comparing inhibited to uninhibited values, but the study emphasizes inhibition power (IP) defined as 10·log10(Rp,inh/Rp,blank) or 10·log10(jcorr,blank/jcorr,inh), which removes the 1−x bias and expands resolution at high performance. Descriptor generation for ML: Structural molecular descriptors (MDs) were generated using RDKit (208 descriptors). DFT calculations (Turbomole) provided 7 electronic descriptors (e.g., HOMO, LUMO, dipole). An experimental descriptor, the average bulk pH (mean of pre- and post-test measurements), was added. Total descriptor pool: 216 features. Feature selection and model training: Recursive feature elimination (RFE) with random forest (RF) was used to select 5 or 10 features from four sets: structural only; structural+DFT; structural+pH; structural+DFT+pH. Preprocessing removed low-variance features (variance <0.1) and highly correlated features (|r|>0.8); remaining features were scaled with MinMaxScaler. RF regression models (scikit-learn defaults) were trained to predict IE or IP. Data comprised 59 fully dissolved molecules (out of 78) at 1 mM; initial train/test split used ~17% test (10 molecules). Model robustness was assessed with 6-fold cross-validation (KFold), reporting mean and standard deviation of RMSE and R². Leave-one-out CV results are provided in Supplementary Tables. Replication: All electrochemical experiments were performed at least in triplicate per inhibitor.

Key Findings

- Electrochemical performance spans orders of magnitude across inhibitors. From PDP after 24 h, jcorr varied by up to two orders of magnitude; top inhibitors reduced jcorr more than 10-fold vs blank (0.1 M NaCl). Example values (mean ± SE) from Table 1: uninhibited jcorr 604±108 nA cm⁻², Ecorr −620±12 mV, Ebr −486±2 mV; benzotriazole jcorr 216±38 nA cm⁻², Z24h 107±48 kΩ·cm²; 2-mercaptobenzimidazole jcorr 79±18 nA cm⁻², Z24h 265±80 kΩ·cm², Rp24h 253±85 kΩ·cm²; sodium mercaptoacetate jcorr 57±13 nA cm⁻², Z24h 203±64 kΩ·cm², Rp24h 561±191 kΩ·cm²; ammonium pyrrolidinedithiocarbamate jcorr 38±4 nA cm⁻², Z24h 480±106 kΩ·cm², Rp24h 335±73 kΩ·cm². - Time dependence is critical. LPR time series revealed that many inhibitors evolve over the first ~6 h, after which behavior stabilizes; some (e.g., 2-mercaptobenzothiazole) continued developing beyond 18 h. EIS at 2 h often showed lower correlation with other measures due to non-stationarity, whereas EIS at 24 h correlated strongly with time-weighted (Rp). - Certain compounds accelerated corrosion. Nearly half of candidates acted as accelerators. A notable case: 2,5-dimercapto-1,3,4-thiadiazole acidified the solution to pH ~3 (vs blank pH ~6), disrupting the passive film and causing active corrosion, with Z24h and Rp24h collapsing to ~3 kΩ·cm². - Inhibition power (IP) outperforms inhibition efficiency (IE) for analysis and ML. Pearson correlations among techniques were consistently higher when using IP than IE. For IE, technique-to-technique correlations were all <0.9 except LPR vs EIS at 24 h, and clustering at high efficiencies (>90%) masked differences among top performers. Using IP removed this bias, yielding more linear, uniformly distributed relationships and better discrimination among high-performing inhibitors. Example statistical significance: Pearson test p-values ranged from ~1e−134 to 1e−51. - Ranking and chemical trends: Time-weighted LPR-based IP enabled quantitative ranking. NS-containing molecules consistently performed best and none acted as accelerators; O-only molecules were accelerators in ~80% of cases (exceptions: sodium acetate, vanillin). Presence/absence of aromatic rings showed no clear performance difference (aliphatic candidates often contained multiple π-bonds). Mixed NO or SO systems exhibited highly variable behavior (e.g., 4-mercaptobenzoic acid vs thiobenzoic acid). - Electrochemical potentials: Across all inhibitors, Ecorr showed wide variation (μ ≈ −576.9 mV, σ ≈ 72.3 mV vs Ag/AgCl), while Ebr was narrowly distributed (μ ≈ −492.1 mV, σ ≈ 17.6 mV), indicating Ebr is an intrinsic substrate property related to pit activation. The passive range (Ebr−Ecorr) was modulated by inhibitors but showed no significant linear correlation with IP due to high scatter; it remains a promising target/descriptor for localized corrosion behavior. - pH effects: High IP (>10 dB; IE >90%) generally occurred when bulk pH remained near neutral (~6). Solutions outside the Al stability window (pH 4.5–8.5) tended to show lower IP. While no linear correlation between IP and bulk pH (mean or ΔpH) was observed, pH explained outliers and provided mechanistic context absent from purely computational descriptors. - Machine learning/QSPR results: RFE consistently selected bulk pH and several DFT features when available; HOMO was most frequently selected, with LUMO and dipole also commonly chosen, despite near-zero simple correlations with IE/IP. In a representative train/test split (Table 2): for IP with 5 features, RMSE as low as 0.15 with structural+DFT and structural+DFT+pH, R² up to 0.55; for IE with 10 features, adding pH or pH+DFT reduced RMSE to 0.18 and raised R² to ~0.49–0.51. In 6-fold CV (Table 3): best average performance included pH and DFT (structural+DFT+pH), with IE RMSE ≈ 0.14±0.02 and R² ≈ 0.35±0.21; for IP RMSE ≈ 0.18±0.03–0.04 and R² ≈ 0.41±0.11–0.13. High fold-to-fold variance and sensitivity to outliers indicate the need for more training data. - Technique comparability: PDP showed systematically lower IP than LPR/EIS, likely due to surface modification at high overpotentials and subjectivity in Tafel analysis on AA2024-T3 where activation control is limited by oxygen diffusion and localized anodic processes.

Discussion

The study demonstrates that a comprehensive, time-resolved electrochemical dataset across LPR, EIS, and PDP provides the mechanistic breadth needed to train more reliable ML models for inhibitor discovery on AA2024-T3. By adopting inhibition power (IP) instead of inhibition efficiency (IE), the analyses avoid nonlinear compression at high performance, yielding better inter-technique correlations and balanced target distributions, which are advantageous for model training. Time dependence is crucial: measurements before ~6 h can be non-stationary, leading to lower reliability (e.g., EIS at 2 h). The results confirm that NS-containing molecules are robust inhibitors, while O-only compounds often accelerate corrosion, likely linked to their interactions with the passive film and intermetallic particles. Electrochemical potentials clarify that Ebr is largely substrate-intrinsic, while Ecorr and the passive range are modulated by inhibitors, though simple linear relationships to IP are weak, suggesting more complex interactions. Incorporating experimental descriptors such as bulk pH alongside structural and DFT features improves QSPR model performance and robustness, even when simple correlations are low. This underscores the value of mechanistic augmentation—capturing environmental effects and system-level behavior that purely structural or in vacuo electronic descriptors miss. Collectively, these findings address the initial objective by establishing best practices for data collection (include >6 h, use IP, time-weighted Rp) and modeling (RFE with structural+DFT+pH), laying the groundwork for active-learning-driven inhibitor discovery.

Conclusion

This work establishes an experimental foundation for ML-guided corrosion inhibitor discovery by creating a multidimensional, time-resolved electrochemical dataset for ~80 small organic molecules on AA2024-T3 and by demonstrating best practices for analysis and modeling. Key contributions include: (i) validating inhibition power as a superior metric to inhibition efficiency for ranking and correlating measurements; (ii) showing that time-weighted LPR correlates strongly with EIS and effectively captures time-dependent behavior; (iii) identifying chemical trends, with NS-containing inhibitors consistently outperforming and O-only compounds often accelerating corrosion; and (iv) demonstrating that augmenting structural descriptors with experimental pH and DFT features improves QSPR model accuracy and robustness. Future research should expand high-quality datasets to improve generalization, develop faster screening protocols that capture high-resolution electrochemical information over shorter durations, and advance ML models (e.g., active learning) that integrate mechanistic descriptors (electrochemical potentials, passive range, pH) with molecular features to better link inhibitor structure to protective performance.

Limitations

- Dataset size and solubility: Of 78 tested molecules, only 59 fully dissolved at 1 mM were used for ML, limiting training size and chemical diversity. Several candidates had solubility issues, potentially biasing the dataset. - Time invariance: Early-time measurements (<6 h) may violate linear, causal, time-invariant assumptions (notably EIS at 2 h), reducing cross-technique correlations and reliability. - PDP limitations: High overpotentials alter surface chemistry; Tafel analysis on AA2024-T3 is challenging due to diffusion-limited cathodics and localized anodics, introducing uncertainty in jcorr. - pH measurement: Only bulk pH was measured; it may not reflect local interfacial pH and gradients, limiting interpretability. - Model robustness: Cross-validation showed high variance and sensitivity to outliers; models require more training data for stronger generalization. IE/IP target distributions were imbalanced, necessitating careful metric selection (IP) and possibly resampling strategies. - Heterogeneity of inhibitor effects: Simple linear correlations between mechanistic parameters (e.g., passive range) and performance were weak, indicating complex, possibly nonlinear or interaction effects not fully captured by current descriptors and models.

Related Publications

Explore these studies to deepen your understanding of the subject.

Computer Science

The Goldilocks paradigm: comparing classical machine learning, large language models, and few-shot learning for drug discovery applications

S. H. Snyder, P. A. Vignaux, et al.

Engineering and Technology

Machine learning assisted discovery of high-efficiency self-healing epoxy coating for corrosion protection

T. Liu, Z. Chen, et al.

Engineering and Technology

Machine Learning Techniques for the Performance Enhancement of Multiple Classifiers in the Detection of Cardiovascular Disease from PPG Signals

S. W. Rabkin, A. Cataldo, et al.

Physics

Machine-learning-guided discovery of the gigantic magnetocaloric effect in HoB₂ near the hydrogen liquefaction temperature

P. B. D. Castro, K. Terashima, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny