logo
ResearchBunny Logo
Data-driven pitting evolution prediction for corrosion-resistant alloys by time-series analysis

Engineering and Technology

Data-driven pitting evolution prediction for corrosion-resistant alloys by time-series analysis

X. Jiang, Y. Yan, et al.

Discover groundbreaking research by Xue Jiang, Yu Yan, and Yanjing Su that utilizes a data-driven approach with Long Short-Term Memory neural networks to predict free corrosion potential in cobalt-based alloys and duplex stainless steels. This innovative method significantly enhances the forecasting of corrosion behavior over time, surpassing traditional machine learning techniques.

00:00
00:00
~3 min • Beginner • English
Introduction
Corrosion initiation and propagation involve random, dynamic processes governed by the coupling of internal alloy characteristics and external environments. These processes include passive film rupture/dissolution, metal dissolution kinetics, and ion diffusion, which collectively shape local chemistry and pit morphology over time and guide alloy design. Accurate prediction of pitting propagation in corrosion-resistant alloys requires leveraging past corrosion behavior to predict future states, a long-standing challenge. Time-series learning methods, such as LSTM networks, can explicitly exploit temporal dependencies by using memory mechanisms (input, forget, and output gates) to address long-term sequence dependencies. In this study, using cobalt-based alloys and a duplex stainless steel as case studies, the authors built a time-series model to predict free corrosion potential (Ecorr) from long-term immersion data and compared it to traditional machine learning approaches to assess whether temporal dependencies improve prediction of future pitting evolution.
Literature Review
Prior work has applied machine learning to corrosion problems since the 1980s, including artificial neural networks, random forests, and other algorithms for predicting uniform corrosion rates, identifying influential factors, and modeling local corrosion behavior across different compositions, processing conditions, temperatures, and environments. Reviews (e.g., Coelho et al.) have assessed predictive power and regression approaches in electrochemical corrosion. Random Forest models have been used for time-dependent corrosion rates under inhibitor dosing schedules. However, traditional statistical/ML methods (SVM, RF, gradient boosting) often assume independent and identically distributed samples and do not exploit inherent temporal relationships in service data. Time-series models like LSTM can learn mappings from sequences of past observations to future outputs and mitigate vanishing gradients via gated architectures, offering a path to model pitting as an evolving time-dependent process.
Methodology
Data collection and materials: Four alloys were studied—Cast Stellite 6, Cast Stellite 12, Cast Stellite 706 (Co-based), and Zeron 100 (duplex stainless steel). Nominal compositions and hardness metrics are provided (Table 1 in the paper). Samples were cast, mounted in epoxy with lead wires, and polished (80–1200 grit SiC, followed by 6 µm diamond paste). Vickers hardness (HV) was measured up to 50 times per sample. Immersion experiments: Long-term immersion tests were performed in 3.5 wt.% NaCl at 18 °C for 150 days, with daily Ecorr measurements. Visible pitting was first observed around day 30 for all alloys; consequently, the first 30 days were removed to focus on the propagation stage. Dataset: The final dataset comprised 480 entries covering alloy composition, HV, HV standard deviation, and immersion time for days 30–150 (120 days), along with daily Ecorr targets. The dataset is shared via the Materials Genome Engineering Database (https://www.mgedata.cn/search/#/153870/1064). Traditional ML models: Support Vector Regression (SVR), k-Nearest Neighbor Regression (KNR), Gradient Boosting Regression (GBR), Random Forest Regression (RFR), and AdaBoost Regression (AdaBR) were implemented using scikit-learn. The pooled dataset was split 80/20 into training (384 entries) and testing (96 entries). Hyperparameters were optimized by grid search with five-fold cross-validation on the training set. Performance metrics included mean squared error (MSE) and mean relative error (MRE). Time-series model (LSTM): The original per-day data were transformed into sequences via a sliding lookback window; a lookback of L uses the previous L days' data to predict the next day's Ecorr. Lookbacks of 3, 5, 8, and 15 were evaluated per alloy. For each lookback, sequences were split 80/20 by order into training and testing to assess generalization. The LSTM was implemented in TensorFlow/Keras. For Cast Stellite 6 (example architecture): input layer with 5 units (lookback = 5), three hidden layers with 128 units each, dropout between layers, and a fully connected output layer with one unit and ReLU activation. MSE served as the loss; optimization used Adam; hyperparameters tuned included number of layers/units, batch size, dropout rate, and lookback length. Additional validation: After building models on the initial 120-day propagation data, additional 70-day immersion tests (days 151–220) were conducted for all four alloys with daily Ecorr measurements. The pre-built GBR and LSTM models were then used to predict Ecorr evolution over these 70 unseen days to evaluate future prediction capability.
Key Findings
- Traditional ML performance: Among SVR, KNR, GBR, RFR, and AdaBR, RFR showed the lowest mean MSE during cross-validation, while KNR and GBR were also competitive; GBR had lower variance, indicating more stable performance. On the hold-out test set, GBR achieved an MRE of 3.3% with better generalization (smaller train–test gap) than RFR. - Feature importance: Permutation importance (repeated 50 times) and SHAP analyses on the GBR model consistently identified Fe, C, Si, and immersion time as the most important predictors for Ecorr (order differed slightly). SHAP indicated that higher Fe tends to increase Ecorr, whereas longer immersion time decreases Ecorr. - LSTM lookback selection: For time-series prediction, lookback = 5 minimized testing MSE and avoided overfitting relative to larger windows (e.g., 8 or 15) while capturing temporal dependencies better than smaller windows. - Future 70-day prediction: When applied to the subsequent 70 days of immersion, the GBR model produced nearly fixed Ecorr predictions that failed to follow the experimental evolution trends across all four alloys. In contrast, the LSTM model tracked the temporal evolution of Ecorr well during the 70-day testing stage for Cast Stellite 12, Cast Stellite 706, and Zeron 100; larger absolute deviations occurred for Cast Stellite 6 due to a sharp Ecorr increase near day 150. Overall, the LSTM captured inherent time-series dependencies and provided more realistic long-term pitting evolution predictions than traditional ML.
Discussion
The study demonstrates that pitting propagation exhibits strong temporal dependencies that are not adequately modeled by traditional ML methods treating each data point as i.i.d. Even though GBR fit well on the training data, it failed to extrapolate the dynamic trend during the subsequent 70-day immersion. The LSTM, by leveraging prior sequences via its gated memory, better preserved and utilized the inherited local corrosion states, capturing the cumulative effects of time on Ecorr evolution. Feature importance analyses reinforce the role of alloy chemistry (Fe, C, Si) alongside immersion time in driving Ecorr behavior and suggest physically plausible effects (e.g., Fe content correlating with increased Ecorr and longer exposure leading to decreased Ecorr). The larger deviation for Cast Stellite 6 around the train–test boundary (near day 150) highlights sensitivity to abrupt regime changes, a known challenge for sequence models trained on preceding trends. Nonetheless, across alloys, the LSTM's alignment with observed trajectories supports time-series modeling as a more appropriate framework for forecasting localized corrosion evolution over service time.
Conclusion
Using long-term immersion data for cobalt-based alloys and a duplex stainless steel, the authors developed a time-series LSTM model that predicts daily free corrosion potential (Ecorr) during the pitting propagation stage. Compared with traditional ML models (e.g., GBR), which generalize well within the observed data but fail to reproduce future temporal evolution, the LSTM accurately captured sequence dependencies and better matched Ecorr trends over an additional 70 days of immersion. Feature importance analyses identified Fe, C, Si, and immersion time as key drivers of Ecorr. This work establishes a data-driven, time-series approach for forecasting localized corrosion evolution and suggests its broader applicability to other material service and lifetime behavior predictions.
Limitations
The model training focused on the propagation stage, excluding the first 30 days of initiation where Ecorr fluctuated strongly, so initiation behavior is not modeled. The study considered four specific alloys and a single environment (3.5 wt.% NaCl at 18 °C), which may limit generalizability to other alloys or conditions. Time-series lookback selection involved a tradeoff between capturing long-term dependencies and overfitting; larger windows exhibited overfitting in tests. The LSTM showed larger absolute deviations for Cast Stellite 6 near a sharp Ecorr change around the train–test boundary.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny