
Environmental Studies and Forestry
Regional aerosol forecasts based on deep learning and numerical weather prediction
Y. Qiu, J. Feng, et al.
Explore the groundbreaking research by Yulu Qiu and colleagues on Pollution-Predicting Net for PM2.5, a cutting-edge framework that harnesses deep learning for accurate air quality forecasting in the Beijing-Tianjin-Hebei region. Their model not only integrates weather data but also enhances predictions during pollution spikes. Discover how PPN demonstrates superior performance compared to traditional methods!
~3 min • Beginner • English
Introduction
Fine particulate matter (PM2.5) is a major pollutant in China, posing health risks and requiring accurate forecasts for public warnings and emission control. Traditional chemistry transport models (CTMs) like WRF-Chem, CMAQ, and GEOS-Chem simulate physical-chemical processes but suffer from uncertainties in emissions, meteorology, and simplified chemistry, and are computationally expensive. Data assimilation can improve CTM forecasts mainly within 24 h. Statistical and machine learning methods reduce computational cost and capture nonlinearity but often miss complex spatiotemporal correlations, especially for regional forecasting. Recent deep learning (DL) models (e.g., CNN-LSTM variants) have improved single-city, short-horizon predictions but typically focus on daily or sub-daily horizons and urban scales. This study aims to develop a regional, short-range (0–72 h) spatiotemporal DL model (PPN) that integrates numerical weather prediction inputs and preceding PM2.5 observations, along with a weighted loss to better handle interpolation biases and extremes, to improve PM2.5 forecasts over the Beijing-Tianjin-Hebei region.
Literature Review
The paper reviews two main forecasting approaches: (1) CTMs (WRF-Chem, CMAQ, GEOS-Chem) widely used for prediction, source apportionment, and mechanism analysis, but affected by emission and meteorological uncertainties and simplified parameterizations; data assimilation (e.g., 3DVAR, 4D-LETKF) improves PM forecasts up to about 24 h but benefits decay with lead time and computation is intensive. (2) Statistical/ML methods including linear regression and algorithms like Random Forest and XGBoost have shown reasonable accuracy for site-level predictions but struggle with regional spatiotemporal dependencies. DL approaches (CNNs for spatial features, RNN/LSTM/GRU for temporal dependencies) have been applied to PM2.5 with improved performance (e.g., CNN-LSTM hybrids), yet often limited to urban scales and shorter horizons. Prior DL studies report RMSEs around 22–53 µg m−3 for first 6 h and ~24 µg m−3 for next 24 h in Beijing. The literature highlights the need for models that jointly capture spatial and temporal dynamics at regional scale over multi-day horizons.
Methodology
Model: The Pollution-Predicting Net (PPN) is a spatiotemporal DL model with an encoder-decoder architecture using PredRNN as backbone. Spatially, it mimics CTM process scales via stacked convolutional layers: a local layer (1×1 kernel; 9×9 km) for local processes (chemistry, turbulence), followed by non-local layers for short-distance (3×3; 27×27 km) and long-distance (5×5; 45×45 km) transport. Features are injected by relevance: local variables (e.g., emissions, T2, RH2, precipitation, SLP, PBLH) enter the local layer; transport/synoptic variables (winds, 700 hPa geopotential height, terrain) enter at the short-transport layer. Temporally, preceding timesteps form the encoder (spin-up), and forecast steps form the decoder; decoder outputs PM2.5 each timestep. During decoding, inputs include meteorology, emissions, and previous-step PM2.5 forecast. During encoding, the model additionally ingests observed PM2.5 in the first convolutional layer to provide an FDDA-like initialization constraint.
Experimental region and data: The Beijing-Tianjin-Hebei (BTH) region (36.3–42.0°N, 111.8–121.6°E) at 9 km resolution with 80×80 grids. Meteorology from WRF downscaling of ERA5 (0.25°) over China (9 km, 38 vertical levels, top at 50 hPa); parameterization schemes per Feng et al. 2022 (details in Supplementary Table 2). Emissions: MEIC inventory (SO2, NO, VOCs, primary PM2.5), interpolated from 0.25° to 9 km. Derived features: 24 h change in SLP (dSLP), 24 h change in T850 (dT850), annual mean PM2.5 (2020–2021), and preceding PM2.5 observations. PM2.5 observations from CNEMC stations, interpolated to 9 km grids via IDW for training targets and features (annual mean and preceding PM2.5). All inputs standardized to zero mean, unit variance.
Training and evaluation: Training/validation data from 2020–2021 (5800 sequences). Each sequence: 2 days encoder inputs and 3 days decoder predictions at 3-hourly resolution. Train/val split: 90%/10%. Test datasets: January 2022 (winter; focus of evaluation) and June 2022 (summer). Optimization: Adam, one-cycle learning rate schedule, weight decay 1e-4, gradient clipping. Loss: proposed weighted MSE (WMSE) based on IDW weights; wi=1 for grids with stations; otherwise wi is the average of squared inverse-distance sums to stations within surrounding 3×3 grids; minimum wi=0.1 if no nearby stations. This upweights errors near observations to mitigate interpolation biases in sparsely monitored areas.
Baselines and comparisons: Three ML baselines (Random Forest, XGBoost, shallow MLP with 2 hidden layers of 32 and 8 neurons) trained on the same features and periods without explicit spatiotemporal modeling; also two ablations: PPN_no_IDW (standard MSE loss) and PPN_no_PO (without preceding observation in encoder). CTM benchmark: WRF-Chem (CBMZ chemistry, MOSAIC aerosols), 9 km resolution, MEIC emissions, same WRF configuration, with 3DVAR assimilation of surface pollutants (PM2.5, PM10, SO2, NO2, O3, CO). Metrics include R2, RMSE, and MB across spatial and temporal dimensions and lead times (0–72 h).
Key Findings
- Overall performance (Jan 2022, 3-hourly, BTH): PPN achieved R2=0.70 and RMSE=17.7 µg m−3; in June 2022, R2=0.49 and RMSE=6.9 µg m−3 (lighter pollution).
- Spatial patterns: PPN captured higher PM2.5 in the south and lower in the north; higher RMSE (40–50 µg m−3) and negative mean bias (−15 to −5 µg m−3) in the south, linked to underestimation and challenges with rapid long-distance transport during north-to-south cold surges.
- Lead-time dependence: Best performance within 0–24 h; spatial R2 declines from ~0.74 at initialization to ~0.58 at 24 h; RMSE increases from ~12 to ~17 µg m−3 by 24 h; stabilizes for 24–72 h with R2≈0.57 and RMSE≈17–18 µg m−3.
- Weighted loss function: Compared to PPN_no_IDW, PPN reduced the "high-values underestimation, low-values overestimation" bias. Under clean conditions (PM2.5 ≤35 µg m−3), the probability of |MB|>15 µg m−3 decreased. Under heavy pollution (PM2.5 >115 µg m−3), average MB improved from −35 to −28 µg m−3. Improvements were less pronounced for intermediate ranges (35–115 µg m−3).
- Preceding observation restraint: PPN_no_PO had R2=0.63, RMSE=19.9 µg m−3 (vs PPN R2=0.70, RMSE=17.7). For 0–24 h, PPN achieved spatial R2=0.58–0.74 and RMSE=12–17 µg m−3 vs ~0.57 and ~18 µg m−3 for PPN_no_PO. At city sites (13 BTH cities), PPN R2=0.42–0.84 and RMSE=15–42 µg m−3, outperforming PPN_no_PO, with RMSE reductions of 20–28% in northern cities and 3–6% in southern cities.
- Comparison to WRF-Chem with assimilation: Across the 13 cities (Jan 2022), WRF-Chem R2=0.30–0.77 and RMSE=19–45 µg m−3, while PPN achieved R2=0.42–0.84 and RMSE=15–42 µg m−3. PPN RMSEs were 1–35% lower (reductions >10% in 10 cities). By lead time: average RMSE 25 µg m−3 (PPN) vs 30 µg m−3 (WRF-Chem) at 0–24 h (−17%), with 14–16% reductions at 24–72 h; average R2 0.61–0.68 (PPN) vs 0.53–0.57 (WRF-Chem). Larger gains under clean/good air (PM2.5 ≤75 µg m−3; up to 25% RMSE reduction) than polluted days (max ~13%).
- Against RF/XGB/MLP baselines: PPN outperformed traditional ML models trained per-grid without explicit spatiotemporal coupling, attributed to convolutional layers (spatial relations), LSTM (temporal), preceding observations, and weighted loss.
Discussion
The study demonstrates that a spatiotemporal DL framework integrating NWP meteorology, emissions, and preceding PM2.5 observations can improve regional short-range PM2.5 forecasting accuracy and efficiency relative to both traditional ML and a state-of-the-art CTM with data assimilation. The encoder ingestion of observations acts like an FDDA constraint, markedly enhancing 0–24 h forecasts by providing a better initial PM2.5 field. The weighted loss mitigates biases from gridded targets produced by IDW interpolation, particularly improving clean and heavily polluted regimes where models often over/under-estimate extremes. PPN consistently outperforms WRF-Chem across cities and forecast ranges, with especially strong gains under clean conditions, while maintaining stable skill out to 72 h. The model effectively captures regional spatial gradients driven by emissions, meteorology, and orography, though it underestimates southern high concentrations during rapid north-to-south transport events, reflecting limited representation of long-range transport within a single timestep due to small convolutional receptive fields. These findings indicate that physics-informed feature injection and observation-constrained encoding are valuable strategies for regional air quality DL models.
Conclusion
The paper introduces PPN, a regional, short-range PM2.5 forecasting model that fuses spatiotemporal deep learning with NWP inputs, emissions, and preceding observations. Trained on 2020–2021 data and evaluated in January 2022, PPN achieved R2≈0.70 and RMSE≈17.7 µg m−3, accurately reproducing spatial gradients and providing its best performance within 24 h, stabilizing through 72 h. Two key innovations—weighted loss (WMSE) and preceding observation restraint in the encoder—reduced interpolation-induced bias and significantly improved early lead-time forecasts. Compared with WRF-Chem with data assimilation, PPN delivered higher R2 and lower RMSE in most cities and across lead times. Future work could expand the transport receptive field (larger convolution kernels) to better capture long-range, rapid transport, leverage additional observation types and assimilation strategies, and explore broader regions and seasons for generalization.
Limitations
- Limited spatial receptive field for transport (up to 5×5 kernels; ~45×45 km per timestep) constrains representation of rapid, long-distance pollutant transport, contributing to underestimation in southern BTH during north-to-south cold surges.
- Training targets rely on IDW-interpolated PM2.5 grids, which can introduce biases in sparsely monitored areas despite the weighted loss mitigation.
- Performance degrades with lead time, with most degradation within 0–24 h.
- Slight underestimation bias in high-pollution southern areas and overestimation under very clean conditions, though reduced by WMSE.
- Results focus on January 2022 (and briefly June); broader seasonal/regional validation would further assess generalizability.
Related Publications
Explore these studies to deepen your understanding of the subject.