Earth Sciences
Deep learning forecast of rainfall-induced shallow landslides
A. C. Mondini, F. Guzzetti, et al.
Rain is the leading global trigger of landslides, yet quantifying how much rain is needed to initiate a landslide has remained challenging. A significant portion of Earth’s landmasses and population is exposed to landslide risk, which is expected to increase under projected climate and environmental changes. Two predominant approaches exist for short-term forecasting of rainfall-induced shallow landslides: empirical rainfall thresholds and physically based hydrological–slope stability models. Thresholds define rainfall conditions likely to trigger failures, while physical models simulate infiltration and mechanical stability. However, physical models are generally data-intensive and limited to small areas, and empirical thresholds often neglect the complete rainfall history and may use a single threshold across diverse terrains. This study proposes a probabilistic deep-learning approach that models both occurrence and non-occurrence of landslides using only hourly rainfall histories, under the hypothesis that landslide timing and location are primarily controlled by rainfall. The probability of occurrence P(F|R,S) is approximated as P(F|R)×c, where c restricts to areas where landslides can occur. The system is developed and tested on a two-decade dataset from Italy to assess feasibility for large-area operational forecasting.
The paper reviews two main strands: (i) empirical rainfall thresholds, widely used from local to global scales to delineate conditions (e.g., duration D and cumulative rainfall E) associated with landslide initiation, and (ii) physically based hydrological–slope instability models that explicitly represent infiltration and stability but require detailed terrain, hydrological, and environmental inputs often unavailable or unreliable over large areas. Prior work has attempted threshold regionalization and combination with susceptibility, but thresholds typically use only landslide-triggering events and neglect non-triggering events and detailed rainfall histories. Physical models, with few exceptions, are difficult to operationalize at large scales due to data demands. The authors position their approach as addressing these gaps by leveraging both triggering and non-triggering rainfall events and by explicitly incorporating rainfall history via antecedent and triggering periods within a data-driven framework.
Study area and data: Italy, where rainfall-induced shallow landslides are common. Landslide catalog: 2486 events (Feb 2002–Dec 2020 plus 26 Nov 2022 Casamicciola Terme), with temporal accuracy to one hour and minimum geographic accuracy of 10 km. Rainfall data: hourly measurements from 2096 automatic rain gauges (average spacing ~12 km; >300 million records). Rainfall events reconstruction: Using CTRL-T, 780,766 rainfall events were identified with season-dependent dry gaps (48 h in dry season; 96 h in wet season); 2472 events (0.3%) had at least one landslide, 778,294 (99.7%) did not.
Event representation and hypothesis: A rainfall event R is split into antecedent (R_a) and triggering (R_o) periods, with maximum R_o length set to 24 h; no limit for R_a. Introduce lags ℓ1…ℓ24 representing possible R_o lengths. For events with landslides, only rainfall up to the landslide time R_y is considered relevant; post-landslide rainfall is ignored. For events without landslides, all periods are assumed insufficient for initiation.
Variables: For each event and lag ℓ, compute antecedent duration and cumulative rainfall (D_a, E_a) and triggering duration (D_o=ℓ) and cumulative rainfall (E_o). Modeling uses three predictors (D_a, E_a, E_o); D_o is constant per lag.
Dataset construction: Two subsets are built: Y (data points associated with landslides) and X (not associated). For no-landslide events, for each lag ℓ, compute (D_a, E_a, E_o) for the event end, then iteratively move the event end backward hour-by-hour, recomputing variables until D_a=1 h (Steps 1–3). For landslide events, for each lag ℓ, compute variables up to the landslide time R_f and add to Y (Step 4); then iteratively move the landslide time backward by one hour, recomputing variables and adding to X (no landslide occurred in those earlier periods) until D_a=1 h (Steps 5–6). Landslides are associated to the nearest appropriate rain gauge based on distance, elevation difference, and morphology.
Train/validation/test splitting and bagging: Excluding a small demonstration set Z (15 with landslides, 14 without), for each lag ℓ, perform 100 random splits of the remaining data. For each split, create balanced training/validation subsets T^V (80% of with-landslide points plus an equal number of without-landslide points) and W test subsets (remaining 20% with-landslide points and all remaining without-landslide points). T^V is further split into T (64%) and V (~16%). Severe class imbalance is kept in W (test) sets (ratio with/without ≈ 1/100,000; in an alternative description, W lists ~450 with and ~40 million without).
Neural network architecture and training: For each lag ℓ, define a fully connected neural net with 3 inputs (D_a, E_a, E_o), two hidden layers with 4 neurons each (tanh activations), and a single sigmoid output neuron. Regularization: L2 weight decay (δ=0.001) and dropout (γ=0.25). Weights initialized N(0,0.1), biases 0. Loss: binary cross-entropy (Bernoulli likelihood). Optimizer: Adam with inverse time decay (initial learning rate ℓ_e, decay rate 0.05 every 100 steps, β1=0.9, β2=0.999, ε=1e−7). Training with batch size 32, up to 200,000 epochs with early stopping (patience 1000 steps; min decay rate 0.00001). For each lag, 100 independently initialized and trained models form a bagging ensemble.
Threshold selection and evaluation: For testing on highly imbalanced W sets, probability thresholds are chosen per model via ROC trade-off between sensitivity (TPR) and specificity (FPR). Metrics reported include area under ROC (AROC) and Balanced Accuracy (BA). To demonstrate operational use, a voting scheme combines the 100 model decisions per lag (majority vote with vote variance σ² indicating ensemble agreement), then aggregates across 24 lags to a final forecast for an event.
Demonstration set Z: 29 events (15 with landslides, including Casamicciola Terme 26/11/2022, and 14 without) from diverse Italian settings used solely to illustrate operational outputs and ensemble vote diagnostics.
- Training/validation performance: With a default 0.5 probability cut-off, accuracies ranged ~84.8% to ~78.3% for both training and validation, with similar performance indicating little overfitting and good generalization.
- Test performance (per-lag ensembles): Median AROC across lags was high (~0.92 to ~0.88), with all single models far from random (0.50). Balanced Accuracy medians ~0.80–0.82 with low variability and low skewness. Performance was more uniform for longer triggering periods (ℓ ≥ 12) than shorter (ℓ ≤ 6).
- Demonstration (set Z): 14/15 rainfall events with landslides correctly identified (TP = 93.3%), typically with strong ensemble agreement (>91%). For no-landslide events, 12/14 correctly identified (TN = 85.7%), with larger uncertainty than TPs. Overall Cohen’s kappa κ = 0.79 and F1 score = 0.90. Misclassified cases showed higher vote variance, signaling uncertainty.
- Substantive insight: Results indicate it is feasible to anticipate rainfall-induced shallow landslides over large areas using only rainfall histories; timing and location are primarily controlled by precipitation. This enables potential operational forecasting based on rainfall measurements and quantitative meteorological forecasts without detailed terrain data.
The study addresses the long-standing challenge of linking rainfall to landslide initiation by leveraging full rainfall histories and both triggering and non-triggering events within a probabilistic deep-learning framework. Unlike empirical thresholds (which often ignore non-triggering events and detailed temporal dynamics) and physically based models (which require extensive local data and are hard to scale), the proposed approach achieves high discriminative power (AROC ≈ 0.88–0.92; BA ≈ 0.80–0.82) across a national-scale dataset. Demonstration results show strong predictive capability and interpretable ensemble uncertainty indicators for individual events. Geomorphologically, the findings support that rainfall is the dominant driver of shallow landslide initiation at the landscape scale considered, while fine-scale terrain conditions determine initiation points at scales below the model’s resolution. Operationally, the approach suggests that landslide nowcasting/forecasting can be integrated into early warning systems using rainfall observations and forecasts alone, with near-real-time update capability as new rainfall data arrive.
This work introduces and validates a rainfall-only, deep-learning probabilistic framework to forecast populations of rainfall-induced shallow landslides over large areas. Using hourly rainfall histories and a lag-based representation of antecedent and triggering periods, ensembles of simple neural networks deliver high performance on highly imbalanced datasets and produce actionable outputs (majority votes with uncertainty). The approach bypasses the need for detailed terrain data, enabling potential operational deployment in geographical landslide early warning systems and integration into standard weather forecasting workflows, provided reliable quantitative precipitation forecasts are available. Future research may explore inclusion of additional predictors (e.g., morphology, lithology, soil moisture) where accurate and temporally consistent, extensions to other regions with similar climatic regimes (e.g., Mediterranean), handling non-stationarity via periodic recalibration, and alternative classifiers (e.g., SVMs, convolutional LSTMs) and data segmentation strategies tailored to class imbalance.
- Data and catalog completeness: The landslide catalog, although extensive and temporally/geographically accurate, is unsystematic and incomplete, with uncertain completeness and sometimes uncertain landslide timing, limiting reconstruction of triggering conditions and contributing to false negatives.
- Stationarity assumption: The approach assumes stationarity of rainfall and landslide records; climate, environmental, or geological changes over time can degrade performance, necessitating reassessment and recalibration or architectural redesign.
- Modeling choices: The 24 h maximum triggering window is an arbitrary but reasonable choice; longer windows may not improve performance and could add uncertainty. The simple network architecture and hyperparameters, while effective, may not be optimal for all lags.
- Measurement uncertainties: Rainfall measurement errors and representativeness, and the gauge–landslide association (single-gauge assignment) introduce epistemic uncertainty in rainfall histories.
- Class imbalance and thresholding: Severe imbalance in test data requires careful threshold selection; alternative cost-sensitive training or evaluation metrics could alter performance trade-offs.
- Generalizability: Best performance is expected in regions with similar meteorological and climatic regimes to Italy; performance elsewhere may require retraining with local data.
- Uncertainty quantification: Ensemble vote variance provides a practical confidence indicator but does not capture all sources of uncertainty; high-variance cases warrant caution.
Related Publications
Explore these studies to deepen your understanding of the subject.

