Environmental Studies and Forestry
Improving air quality assessment using physics-inspired deep graph learning
L. Li, J. Wang, et al.
Air pollution adversely impacts health, climate, and ecosystems, and concentrations vary sharply over short distances due to heterogeneous emissions, transport, and chemistry. Mechanistic models (dispersion/photochemical/plume) face uncertainties from missing inputs and complex parameterizations, while purely statistical or machine-learning methods can yield physically inconsistent results and overoptimistic validation when using random spatiotemporal splits. There is a need for reliable, fine-resolution (1 km, daily) estimates with strong generalization to unobserved locations. The study aims to integrate physics into deep graph learning to explicitly encode local advection and diffusion dynamics and encourage mass conservation/continuity, improving extrapolation performance and physical plausibility of air quality assessments across China.
Prior work combines ML with chemistry-climate models to emulate components, reduce bias, or represent subgrid processes. Physics-informed neural networks introduce PDE residuals to guide learning toward physically consistent solutions. Graph neural networks (rooted in spectral graph theory and locality) have modeled complex interactions in irregular data domains. Existing air quality studies using tree-based ML showed high accuracy but may violate continuity and struggle with extrapolation. Mechanistic systems like MERRA2-GMI or WRF-Chem capture large-scale processes but often miss fine-scale gradients due to coarse inputs/resolution and uncertain parameters. Few studies embed fluid physics into deep learning for high-resolution, multi-pollutant air quality mapping; this work addresses that gap by coupling graph convolutions with PDE-based soft constraints and residual deep networks.
Problem framing: A 2-D Eulerian formulation models ground-level pollutant evolution using horizontal advection and diffusion with source/sink terms (emissions, chemistry, deposition). Vertical processes are not explicitly modeled but approximated via proxy variables (e.g., PBL height, vertical meteorology). Model architecture (Deep Graph hybrid Modeling, DGM):
- Inputs: atmospheric/surface grids, emissions proxies (e.g., MODIS AOD for PM, OMI NO2, MERRA2-GMI reanalysis pollutant fields, traffic from OpenStreetMap), meteorology (including vertical profiles), land use/NDVI (dry deposition), precipitation (wet deposition), elevation, time indices; total 56 variables.
- Local multilevel Graph Convolutions (GC): Construct for each spatiotemporal target node a k-NN local graph (k=12 optimal) over a 1 km grid and daily time step. Spatial and temporal distances are standardized and fused; inverse distance used as aggregation weights. Multiple GC layers simulate multiscale advection/diffusion: feature dimensions per GC layer are 128 → 64 → 32 → 1. Laplacian operators approximate diffusion (second-order derivative), and directional differences approximate advection along edges. The GCN is inductive, allowing generalization to unseen nodes.
- Fusion with Full Residual Deep Network (FRDN) and attention: Concatenate GC outputs with original features, then pass through a residual encoder–decoder to model local sources/sinks, chemical transformation, and deposition. Residual skip connections reduce vanishing gradients and GC over-smoothing; attention layers weight important inputs. Encoder/decoder sizes: 512→320→256→128→96→64→32→16 (decoder symmetric). Outputs are inverse-normalized and exponentiated to original units.
- Physics-informed loss: L = e1 + α e2 + β e3, where e1 is MSE on labeled data, e2 is continuity PDE residual (mass conservation/continuity), e3 regularizes parameters. Sensitivity analysis identified α=0.5 and β=0.2 as optimal. Governing equations (simplified): ∂C/∂t = −∇(VC) + ρ∇²C + R + E − F; advection/diffusion decomposed via Reynolds framework; GC approximates operators on discrete graphs. Study area and data: Mainland China; daily 1×1 km grids, 2015–2018. Observations: 1,913,012 samples from 1,604 stations for six pollutants (daily mean CO, NO2, PM2.5, PM10, SO2; daily maximum O3 and 8-h average O3). Training/testing protocol: Site-based independent testing uses 288 stations (~18%) withheld entirely for testing (all times). From remaining data, 78% for training and 22% for regular testing, stratified by province and month. Semi-supervised training constructs local graphs for all samples (train, regular test, predict) to enforce e2 continuity; only training samples minimize e1. To reduce randomness, 100 models are trained per pollutant and performance averaged. Optimization: Log-transform and normalize targets; minibatch size 2048; Adam optimizer; initial learning rate 0.001 (adjusted), 200 epochs. Implementation with PyTorch Geometric. Hardware: 128 GB RAM, 16 CPUs, three NVIDIA 1080Ti GPUs; training/testing per pollutant ~3–4 days; national daily predictions (2015–2018) ~10 days with parallelization. Baselines: GraphSAGE, FRDN only, Random Forest, XGBoost, GAM, ordinary kriging, regression kriging; hyperparameters tuned via grid search using identical covariates (except ordinary kriging). Sensitivity analysis also evaluated models without MERRA2-GMI pollutant inputs.
- Generalization and accuracy: In site-based independent tests (unseen locations), DGM improved explained variance (R2) by 11–22% and reduced RMSE by 12–35% versus baseline ML methods. PM2.5 and PM10 achieved site-based mean R2 of 0.85–0.87; NO2 and O3A8 0.66–0.78. For SO2, Random Forest had slightly higher overall test R2 (0.74 vs 0.73) but worse site-based R2 (0.58 vs 0.63) and higher RMSE (13.11 vs 12.37 µg/m³).
- Contribution of components: Compared to the FRDN-only network, DGM improved test R2 by 6–11% and site-based R2 by 7–21%; GC contributed 6–19% R2 gains, and adding PDE residuals provided an additional 1–4% R2 improvement. PDE residual component e2 converged to small RMSE values (1.62E−5 to 0.134), indicating good mass-conservation behavior.
- Temporal dynamics: DGM better captured time series at independent sites, with lower site-based RMSE distributions and monthly mean errors than baselines.
- Against reanalysis and mechanistic models: Correlations of DGM point and 1-km grid estimates with ground measurements were 0.76–0.92 (points) and 0.66–0.83 (grids), far exceeding MERRA2-GMI (0.005–0.53). Compared to high-resolution WRF-Chem over Asia (2015), DGM achieved higher station-wise correlations (0.77–0.92 vs 0.29–0.51) and mean temporal evolution correlation (0.99 vs 0.71–0.90).
- Event reconstructions: The model reproduced fine-scale spatiotemporal patterns for representative events: 2015 North China sandstorm (PM10), 2016 East China haze (PM2.5), 2017 Beijing ozone episode (O3A8), 2018 Shanghai winter NO2 haze, aligning with wind/GPH fields and HYSPLIT back-trajectories; grid–measurement correlations 0.80–0.92.
- Role of physics and inputs: GC-driven local transport modeling yielded the largest gains for inert pollutants (PM2.5, PM10: +14–15% R2), with smaller but significant gains for reactive species (NO2, O3: +7–13%; CO, SO2: +4–6%). MERRA2-GMI pollutant inputs modestly aided reactive species but contributed only 0–4% to independent test R2 overall; DGM remained robust without them.
- National AQI and trends (2015–2018): Population-weighted means exceeded raw means for NO2 (
+9 µg/m³) and PM2.5 (+13.5 µg/m³). Population-weighted daily AQI declined by ~6 points; PM2.5 and SO2 decreased 18–30%, PM10 and CO decreased 12–13%, NO2 decreased ~4%, while O3 8-h average increased ~8%. Seasonal patterns: better air quality in summer (AQI 35–41) vs winter (47–58). Largest AQI declines in Beijing–Tianjin–Tangshan, Jilin, Sichuan Basin.
Embedding fluid physics into a deep graph framework addressed key limitations of purely statistical and purely mechanistic approaches by explicitly modeling local advection/diffusion and softly enforcing mass conservation/continuity. This yielded better extrapolation to unmonitored locations, more stable temporal dynamics, and spatially continuous fields that reflect physical transport, particularly for inert pollutants with strong regional transport. The approach outperformed coarse reanalysis and was competitive with or superior to high-resolution mechanistic modeling for capturing observed variability. The national 1-km daily surfaces enabled more reliable population-weighted statistics and trend analyses, revealing broad declines in most pollutants associated with clean air actions, contrasted by rising ozone likely linked to meteorology and precursor chemistry. The framework’s inductive GCN design and physics-informed loss promote generalization, and concatenation with residual/attention layers mitigates GC over-smoothing while capturing local sources, chemistry, and deposition.
The study introduces a physics-inspired hybrid deep graph model that integrates multilevel graph convolutions with a residual deep network and a PDE-based soft constraint to produce fine-scale (1 km, daily) multi-pollutant air quality estimates with improved generalization and physical consistency. Across China (2015–2018), the method outperformed diverse machine learning baselines and reanalysis products, accurately reconstructing major pollution episodes and national AQI trends. Contributions include: (1) a meshfree, inductive GCN that simulates local transport; (2) physics-informed loss enforcing continuity/mass conservation; (3) a coupled architecture capturing sources, chemistry, and deposition; and (4) robust extrapolation at unobserved sites. Future directions include explicitly incorporating vertical processes as measurements become available, integrating improved or dynamic emissions inventories, coupling with scenario-based climate and emissions trajectories for forecasting, and extending to other regions and pollutants with transfer learning.
The model explicitly simulates only horizontal (2-D) advection and diffusion due to lack of vertical concentration measurements; vertical processes are approximated via proxies (e.g., PBL height, vertical meteorology). Emission inventories and reanalysis inputs can be uncertain or coarse, potentially affecting inputs to the model. Graph convolutions risk over-smoothing (partially mitigated by residual and attention layers). Training and national prediction are computationally intensive (multi-day runtimes with GPUs).
Related Publications
Explore these studies to deepen your understanding of the subject.

