Earth Sciences
Predicting fault slip via transfer learning
K. Wang, C. W. Johnson, et al.
The study addresses the challenge of predicting instantaneous and future fault-slip characteristics when training data are sparse, as is typical for natural faults where earthquake cycles span decades to centuries. While machine learning has been successful in laboratory settings and for slow-slip processes in Earth, applications to seismogenic fault slip are hindered by limited, noisy geophysical data. The research hypothesis is that transfer learning from numerical simulations can provide a viable pathway to predict fault friction in laboratory experiments and potentially in Earth by leveraging simulation-trained feature representations and fine-tuning with limited real data. The purpose is to develop and evaluate a convolutional encoder-decoder (CED) model trained on simulated acoustic emission proxies and transferred to laboratory acoustic emissions to predict fault friction throughout the slip cycle, even with minimal lab data. This approach could bridge the gap between abundant simulation data and sparse observational datasets, improving earthquake hazard assessment.
Prior work has demonstrated that continuous seismic or acoustic signals contain information about fault state and slip in laboratory experiments, enabling prediction of friction, time-to-failure, and other properties using various ML approaches. Extensions to Earth have succeeded primarily for slow-slip phenomena where tremor provides relatively strong signals. However, for fast/seismogenic slip the emitted signals are weak and often obscured by noise, and data-driven models have not consistently extracted predictive patterns. Transfer learning has shown promise in geophysics for seismic imaging, subsurface feature classification, and fault detection by pretraining on large or synthetic datasets and fine-tuning on limited target data. Despite this, transfer learning from physics-based numerical simulations to predict quantitative fault-slip characteristics in lab or Earth settings had not been evaluated prior to this work.
Data sources: (1) Numerical simulations using a combined finite-discrete element method (FDEM) model (HOSS) of a shear apparatus analogous to a bi-axial experiment. The model simulates granular gouge confined between plates; kinetic energy time series (E) are used as a proxy for acoustic emission (AE), and the target output is the bulk friction coefficient μ. (2) Laboratory bi-axial shear experiments (p4677 at 2.5 MPa normal stress; p4581 with 3–8 MPa steps) with continuous AE and mechanical measurements (shear/normal stress, friction). AE is recorded via embedded piezo sensors. Preprocessing: Continuous time series (E or AE as input; μ as output) are transformed into time–frequency scalograms using the Continuous Wavelet Transform (CWT) with the real Ricker (Mexican hat, DOG m^2) wavelet. Sampling frequency is 1000 Hz; scalograms are computed with sliding windows of 2 s length and 0.2 s step (tested 0.4–5 s; performance insensitive). Each scalogram is 128×2000. Signals are z-score normalized using training-segment statistics specific to each dataset/task. For FDEM, the E input and μ output statistics are 3.28e-4±5.00e-4 and 0.423±0.0252, respectively. For lab p4677 (0–60 s, six cycles), AE and μ stats are 8.932±14.900 and 0.657±0.0382. Limited sub-cycle training used distinct stats for pre-failure and post-failure subsets. For p4581 predictions, normalization used stats from the first 20% of the 3 MPa segment. Data splits: FDEM data split 60/20/20% (train/val/test). Lab p4677 split 20/20/60% for transfer learning (six cycles used for training/validation). Lab p4581 used only for testing across 3–8 MPa loads. For limited-data transfer, latent space training used only pre-failure or post-failure portions of a single cycle, with 90/10% train/val and reduced window (0.4 s) and step (0.1 s). Data augmentation (FDEM): Additional input E signals created by randomizing Fourier phases while preserving amplitude spectra; repeated three times, yielding 292 training scalogram pairs (73 base + augmentations) and 19 validation pairs. Model architecture: A U-net-like convolutional encoder–decoder (CED) operating on 2D scalograms with skip connections between symmetric encoder and decoder blocks. Encoder: preprocessing block (two 3×11 convs with ReLU, temporal downscale by 25×) followed by four DownSampling2D blocks (each with three 3×3 convs, batch norm, ReLU, and skip). Latent space: two conv layers + ReLU. Decoder: four UpSampling2D blocks using transpose convolutions, followed by postprocessing (transpose convs and final 1×1 conv linear output) to reconstruct output scalogram. Total trainable parameters: 363,696; latent space: 73,984 (~20% of total). Training and loss: Optimizer Adam (lr=1e-3), batch size 8, early stopping when validation reconstruction loss plateaus for 100 epochs and training reconstruction loss <0.1. Hierarchical loss regularization calculates MSE losses at multiple encoder–decoder depths (linked by skip connections) plus a reconstruction loss and L2 regularization (λ=1e-5): L_total = sum of hierarchical losses + L_reconstr + λL2. After initial training, skip connections are deactivated for prediction so information propagates strictly through encoder–latent–decoder. Hardware: NVIDIA Tesla P100 GPU. Variability assessment: multiple runs with/without augmentation and random initializations to characterize performance variance; augmented training reduced variance and improved accuracy. Transfer learning protocol: Step 1: Fully train CED on FDEM (input E scalograms, output μ scalograms). Step 2: Create a new model initialized from Step 1 and freeze encoder and decoder weights; only train latent space on limited lab p4677 AE→μ data (20% of record, six cycles) or, in limited-data setting, on pre-failure or post-failure segments from a single cycle. Step 3: Evaluate on unseen lab p4677 test data and fully independent experiment p4581 across multiple normal stresses. An analogous transfer setup was used to predict time-to-failure (TTF) from AE by fine-tuning the latent space to map AE to TTF labels (failure defined when dμ/dt < −10 s⁻¹).
- Baseline (simulation-only): Training/validation/testing entirely on FDEM produced μ predictions with MAPE = 4.237%, capturing general slip trends and many failures but with modest accuracy.
- Baseline (lab-only): Training and testing solely on lab p4677 (first 20% AE used for training) yielded MAPE = 1.137%, accurately capturing friction variations and failure timing/magnitudes.
- Direct transfer (no cross-training): Applying the FDEM-trained model to lab p4677 (AE→μ) without any lab training gave MAPE = 4.232%, with underprediction of maximum friction drops (event moments) but reasonable timing and scale.
- Transfer with latent-space cross-training: Freezing encoder/decoder and training only the latent space on limited lab p4677 data improved performance to MAPE = 1.650%, approaching the lab-only benchmark.
- Generalization to independent experiment (p4581, varying normal loads): Using the cross-trained model, MAPEs for μ prediction were: 3 MPa: 1.960%; 4 MPa: 2.195%; 5 MPa: 2.670%; 6 MPa: 3.089%; 7 MPa: 3.671%; 8 MPa: 4.555%. Timing of slip events and inter-event stress buildup were well captured; errors increased with higher normal load primarily due to underestimation of failure magnitudes.
- Extremely limited data transfer (training only on parts of a single cycle from p4677): For p4581 predictions, post-failure-trained vs pre-failure-trained latent spaces yielded the following MAPEs for μ: 3 MPa: 3.419% vs 4.158%; 5 MPa: 3.944% vs 4.710%; 7 MPa: 4.772% vs 5.979%. Post-failure training performed slightly better, suggesting richer state information.
- Time-to-failure (TTF) prediction via transfer learning: Cross-trained model predicted TTF on p4581 with MAPEs: 3 MPa: 2.980%; 5 MPa: 3.738%; 7 MPa: 3.710%, demonstrating feasibility of predicting other state variables via the same transfer framework.
- Robustness: Model performance was stable across different sliding window sizes; data augmentation reduced run-to-run variability and improved average accuracy.
The findings demonstrate that a deep CED trained on physics-based FDEM simulations learns transferable time–frequency features relating AE-like inputs to fault friction. Even without exposure to lab data, the simulation-trained model captures slip timing and general trends on laboratory experiments. Crucially, fine-tuning only the latent space with a small fraction of lab data substantially improves accuracy to near lab-only levels, indicating that most representational capacity in the encoder/decoder is reusable across domains while the latent mapping adapts distributional differences. The cross-trained model generalizes to an independent experiment with different normal stresses, retaining accurate event timing but increasingly underestimating failure magnitudes at higher loads. Training the latent space on post-failure segments outperforms pre-failure training, implying that certain portions of the cycle contain more informative states for adaptation. The approach also extends to predicting time-to-failure from AE using the same transfer strategy. These results suggest a practical pathway for Earth applications where data are sparse: pretrain on suites of numerical simulations spanning diverse behaviors, then fine-tune on limited geophysical observations from the target fault. Such models could be evaluated against independent geodetic or seismic observations to assess predictive skill for fault slip evolution and potentially inform earthquake hazard assessment.
This work introduces and validates a transfer learning framework that maps AE-like signals to fault friction using a convolutional encoder–decoder trained on FDEM simulations and fine-tuned on limited laboratory data. Key contributions include: (1) demonstrating effective transfer from simulations to lab for instantaneous friction prediction; (2) showing that limited latent-space training (about 20% of parameters) with small lab datasets substantially improves accuracy and generalizes to independent experiments with varying loads; and (3) extending the framework to predict time-to-failure. The study establishes a foundation for applying simulation-pretrained models to sparse Earth datasets. Future research directions include: generating simulation ensembles that better span the range of frictional failure magnitudes to mitigate underprediction; incorporating more realistic wave propagation in simulations; exploring domain adaptation techniques beyond latent-space fine-tuning; integrating multimodal geophysical inputs (e.g., seismic and geodetic); and piloting Earth-scale applications where models are pretrained on many simulated earthquake cycles and cross-trained with limited continuous seismic data, then validated against independent displacement measurements.
- Underprediction of frictional failure magnitudes (event moments), especially at higher normal loads, despite good timing predictions.
- Distributional mismatch between simulation and lab friction values (confirmed by KS test) necessitates adaptation; latent-space fine-tuning only partially resolves this.
- Simulations did not model elastic wave propagation; kinetic energy was used as an approximate AE proxy, introducing potential discrepancies.
- Material properties and geometry differ substantially between FDEM and lab setups, which may limit fidelity of transferred features.
- Generalization assessed on two lab experiments; broader validation across more conditions and materials is needed.
- Performance varies with random initialization/training stochasticity; although augmentation reduces variance, residual variability remains.
- Predictions degrade with increasing normal stress; normalization and domain shift handling could be improved.
- Limited-data training captures timing but not full failure magnitude spectrum; additional targeted simulation or adaptation may be required.
Related Publications
Explore these studies to deepen your understanding of the subject.

