Earth Sciences
Global land subsidence mapping reveals widespread loss of aquifer storage capacity
M. F. Hasan, R. Smith, et al.
Excessive groundwater pumping can cause aquifer depletion, loss of storage capacity, arsenic contamination, saltwater intrusion, and infrastructure damage, yet many intensively pumped regions are poorly monitored. Quantifying permanent groundwater storage loss is challenging due to sparse and non-uniform in situ networks and the lack of globally consistent observations. Land subsidence, a visible consequence of pressure loss and compaction in aquifers, can provide a first-order estimate of permanent storage loss in unconsolidated confined systems. While InSAR offers accurate, fine-scale subsidence measurements, its computational demands have limited analyses to local or regional scales. Process-based global groundwater models are hindered by data scarcity for key geomechanical and hydrogeologic parameters and the temporal head changes that drive nonlinear subsidence. No global map of subsidence magnitude and associated storage loss existed. This study addresses these gaps by developing a machine learning framework that predicts pumping-induced subsidence classes globally at ~2 km resolution using remote sensing and model-based hydrologic, land use, climatic, and geologic proxies trained on extensive InSAR and GNSS observations. The goals are to map subsidence magnitude, provide a first-order estimate of aquifer storage loss due to consolidation, and identify the key drivers and hotspots to inform groundwater management under climate change and growing water demand.
Prior work has used InSAR to monitor subsidence at local to regional scales with ~1 cm accuracy and ~100 m resolution, demonstrating its utility for tracking groundwater-related deformation. However, computational cost and atmospheric noise challenges have limited global application. GRACE TWS has been widely used to infer large-scale water storage changes, but its coarse effective resolution (~100,000 km²) blends surface and subsurface signals and can mask localized groundwater declines; about 36% of subsidence training pixels lie in regions with positive GRACE TWS trends, underscoring mismatch in scales. Process-based models have provided global groundwater assessments, yet a global subsidence model has been lacking due to missing geomechanical inputs and head histories. Prior studies have predicted storage-capacity loss from subsidence at regional scales and produced global subsidence susceptibility maps, but none quantified global subsidence magnitude and associated permanent storage loss. This study leverages globally available proxies (hydrologic fluxes, irrigation, population, geology) and observed deformation (InSAR, GNSS) within a data-driven framework to advance beyond susceptibility toward magnitude mapping and storage-loss estimation.
- Overall approach: A supervised machine learning classification model (random forest) predicts three subsidence magnitude classes: <1 cm/year (nominal), 1–5 cm/year, and >5 cm/year, at ~2 km (0.02°) global resolution. Inputs are global remotely sensed and model-based predictors representing hydrologic balance, land use/anthropogenic pressure, and geologic susceptibility. The response (training labels) comes from InSAR- and GNSS-derived subsidence observations.
- Input hydrologic and land use datasets: Precipitation, evapotranspiration, and soil moisture were included to represent water balance where groundwater withdrawals are likely when P < ET. A global irrigation area dataset (~1 km) and a gridded population dataset (~1 km) were incorporated to proxy pumping intensity in croplands and urban areas. Gaussian filtering was applied to irrigation and population to smooth local noise and capture depletion effects beyond pumping centers; both were normalized to 0–1 (higher values indicate higher density).
- Input geologic proxies: To represent fine-grained compressible sediments that drive inelastic subsidence, a normalized clay indicator was created by multiplying 250 m percent clay content at 200 cm depth with average unconsolidated material thickness (~1 km resolution), then normalizing. A binary confining layer indicator was developed from SRTM DEM by inferring lacustrine/marine depositional environments likely to host extensive clay layers; this proxy was validated against major US aquifers from the USGS Ground Water Atlas.
- Assembly of training data: Vertical land motion rates (cm/year) were compiled for 47 global regions with groundwater-related subsidence using three sources: (1) InSAR time series processed by the authors (e.g., Quetta Valley, Qazvin, San Luis Valley; also Hebei and Hefei, China identified by preliminary runs), (2) preprocessed agency datasets (California DWR; Arizona DWR; EGMS for 15 European regions), and (3) georeferenced subsidence rates from published groundwater studies (including China, Indonesia, Iran, Turkey, USA, Vietnam). A GNSS-based global coastal subsidence dataset was also included. InSAR processing used SBAS time series, with LOS velocities decomposed to vertical via ascending and descending orbits. All subsidence data were classified into <1, 1–5, and >5 cm/year classes and resampled to 0.02°.
- Model and training: A random forest classifier (scikit-learn) was trained on the original training dataset created by collocating subsidence class labels with predictor values per pixel. The dataset was split 70/30 into train/test. Class imbalance (84.5% <1, 10.5% 1–5, 5% >5 cm/year) was addressed using class_weight='balanced' to penalize misclassification of minority classes. Hyperparameters were tuned via random-search 10-fold cross-validation, yielding n_estimators=300, max_depth=14, max_features=7, min_samples_split=7, min_samples_leaf=15.
- Prediction and post-processing: The tuned model generated global class predictions and per-class probabilities. A land use filter removed predictions where both normalized irrigated area density <0.06 and normalized population density <0.009 (values below the range associated with observed subsidence in training), eliminating ~7% of 1–5 and >5 cm/year pixels as likely noise.
- Driver analysis: Partial Dependence Plots evaluated learned relationships. Two-way PDPs showed increasing subsidence probability with increasing normalized irrigated area density and normalized clay indicator, and higher subsidence probability under lower precipitation and soil moisture, consistent with arid/semi-arid water deficits and fine-grained geology.
- Storage loss estimation: Permanent groundwater storage loss due to consolidation was estimated by summing pixel-wise subsidence thickness times pixel area over all pixels in the 1–5 and >5 cm/year classes. Average subsidence values of 3 cm/year and 10 cm/year were assumed for the 1–5 and >5 cm/year classes, respectively, yielding a first-order global loss of ~17 km³/year. Bounds were computed using 2 and 7 cm/year (lower: ~11.5 km³/year) and 4 and 13 cm/year (upper: ~22 km³/year).
- Performance evaluation: On the test set, class F1-scores were 0.96 (<1 cm/yr), 0.68 (1–5 cm/yr), and 0.86 (>5 cm/yr), with macro F1=0.83. A Leave-One-Area-Out test (47 iterations, excluding coastal datasets) assessed generalization to unseen regions using probability outputs; 10 regions were categorized as not satisfactory (model could not detect subsidence without training on that region), while the remainder were satisfactory, indicating robustness across diverse climates, hydrology, and geology.
- Global subsidence mapping: The model predicts extensive groundwater-withdrawal-induced subsidence globally at ~2 km resolution, with significant signals in East Asia (e.g., North China Plain including Beijing, Shanghai, Tianjin, Xi’an, Wuhan), South and Central Asia (Bangladesh, Myanmar, India, Pakistan, Indonesia, Iran, Turkey; also Afghanistan, Turkmenistan, Uzbekistan, Azerbaijan, Syria), parts of Europe (notably 1–5 cm/year in Spain’s Albacete, Ciudad Real, Alto Guadalentín; Po Delta, Italy), North America (California, Arizona, Texas, central Mexico; urban Houston, Mexico City), Africa (Nile Delta and several coastal cities mostly <1 cm/yr; 1–5 cm/yr in irrigated areas of Morocco, Algeria, Tunisia), Australia (1–5 cm/yr in parts of the Murray–Darling Basin), and South America (localized >1 cm/yr in Argentina and Peru; >5 cm/yr in Oruro and Cochabamba, Bolivia).
- Land use association: Approximately 73% of predicted subsidence occurs over croplands (~60.5%) and urban/built-up areas (~12.5%); ~19% over vegetated, uncultivated lands; ~8% over other land covers. Subsidence likelihood increases with irrigation and population density and is highest (~75%) in arid and semi-arid climates.
- Country rankings: Highest percentage of country area subsiding >1 cm/year (top 10): Taiwan (8%), Bangladesh (7%), Uzbekistan (4%), Israel (3%), Syria (3%), Myanmar (2%), Turkmenistan (2%), Philippines (2%), China (2%), Vietnam (1%). Countries with largest permanent aquifer storage loss due to consolidation: China, United States, Iran, Uzbekistan, Mexico, Myanmar, Bangladesh, Turkmenistan, Pakistan, India; China, the United States, and Iran account for the majority of global loss.
- Global storage loss: First-order estimate of permanent groundwater storage loss due to consolidation is ~17 km³/year globally (lower bound ~11.5; upper bound ~22 km³/year).
- Drivers: PDPs confirm physically consistent relationships—higher subsidence probability where normalized clay indicator and irrigated area density are high, and where precipitation and soil moisture are low (indicating water deficit and high pumping demand) given suitable geology.
- Model performance: Test-set F1-scores of 0.96 (<1 cm/yr), 0.68 (1–5 cm/yr), 0.86 (>5 cm/yr); macro F1=0.83. LOAO test satisfactory for the majority of regions; 10 regions required region-specific training.
- Validation and discrepancies: General agreement with observed datasets (InSAR, EGMS, GNSS), though the model overestimates in some areas (e.g., parts of Spain’s La Mancha; High Plains Aquifer) likely due to uncertainties in irrigation estimates and geologic proxies.
The study achieves the first high-resolution global mapping of pumping-induced land subsidence magnitude and provides a first-order quantification of associated permanent aquifer storage loss, directly addressing the lack of global-scale, locally relevant subsidence and storage-loss information. The results highlight that subsidence is widespread across both arid/semi-arid and humid regions, closely linked to irrigated agriculture and urban groundwater dependence. By identifying key drivers—fine-grained confining sediments, intense irrigation/population, and hydrologic water deficits—the model contextualizes subsidence risk and informs where management interventions (e.g., reduced pumping, managed aquifer recharge, surface water substitution) may be most needed. Country-level statistics enable comparison of groundwater stress in terms of both affected area and storage loss volumes, revealing disproportionate impacts in certain nations and small island/coastal regions vulnerable to relative sea-level rise. While some overestimation occurs due to uncertainties in geologic and land use proxies, the map aligns with known hotspots and reveals previously undocumented stressed regions, guiding targeted monitoring (e.g., InSAR campaigns) and policy responses. The findings underscore the urgency of sustainable groundwater management to prevent irreversible storage-capacity loss and associated hazards (infrastructure damage, coastal flooding, saltwater intrusion).
This work presents a global, ~2 km resolution machine-learning model that maps the magnitude of groundwater-withdrawal-induced land subsidence and estimates associated permanent aquifer storage loss. Trained on extensive InSAR/GNSS observations and leveraging hydrologic, land use, and geologic proxies, the model identifies widespread subsidence concentrated over croplands and urban areas and estimates a global storage loss of roughly 17 km³/year (11.5–22 km³/year bounds). It quantifies key drivers and delivers country-level assessments to prioritize management. Future research should refine geologic proxies (e.g., deeper clay characterization), improve irrigation and groundwater-use representations, incorporate additional in situ and remote sensing observations where feasible, enhance detection of low-magnitude coastal subsidence, and expand regional InSAR analyses in newly identified hotspots. Developing dense, standardized monitoring networks and integrating dynamic head-change information with data-driven models will further improve predictions and support sustainable groundwater governance.
- Scope limitation: The model estimates subsidence due to aquifer-system compaction from groundwater pumping; it does not capture subsidence from other sources (e.g., tectonics, hydrocarbon extraction), so total subsidence in some regions may differ.
- Low-magnitude subsidence: The <1 cm/year class is treated as nominal; detecting and mapping small deformations (particularly critical in coastal zones) are beyond the model’s intended sensitivity.
- Input data uncertainties: Geologic proxies (confining layer from DEM-based depositional inference; normalized clay indicator limited by shallow (200 cm) clay data and averaged unconsolidated thickness) introduce uncertainty and may misrepresent subsurface conditions outside validated regions. Irrigation and population datasets, even after smoothing, carry errors that can lead to overestimation in some areas.
- Training data representativeness: Despite 47 regions, global heterogeneity means some unique combinations of climate, hydrology, and geology are underrepresented; LOAO tests indicate 10 regions where the model underperformed without region-specific training.
- Resolution mismatch with other datasets: GRACE TWS was excluded due to coarse resolution and mixed signals relative to localized subsidence; lack of global, high-resolution groundwater head-change data limits explicit mechanistic modeling.
- InSAR-related constraints: Processing and interpretation challenges (e.g., atmospheric noise) limit availability and uniformity of training data; spatial gaps remain in observational coverage.
- Post-processing filter: The land use filter thresholds may remove some true positives in sparsely populated/irrigated settings and retain some false positives elsewhere.
Related Publications
Explore these studies to deepen your understanding of the subject.

