Introduction
Anthropogenic air pollution, a major global public health issue, causes millions of premature deaths annually. Fine particulate matter (PM2.5) is a key pollutant linked to various diseases. Exposure assessment models, initially sufficient for assessing inter-city PM2.5 exposure differences at high ambient levels, are now inadequate due to sharply declining ambient PM2.5 levels and the increased contribution (50-80%) of local sources. Micro-environmental PM2.5 levels can remain high, highlighting the need to move beyond ambient PM2.5 assessment. Personal internal dose, reflecting the cumulative interaction between human activities and pollution, offers a more accurate assessment. Direct measurement using portable monitoring suits is costly, limiting its widespread application. Indirect methods using observed or simulated PM2.5 distribution are more feasible but often limited by monitoring representativeness and the coarse resolution (at most 1 km) of chemical transport models (CTMs). Ultra-high-resolution concentration fields combined with high-resolution population activity data are crucial for improved exposure assessment. Recent studies have fused auxiliary data (satellite measurements, CTM outputs, meteorological variables) with observation data to improve PM2.5 distribution simulation. However, resolution limitations and resampling to coarser levels introduce uncertainties. The availability of high-resolution data like aerosol optical depth (AOD) from Sentinel-2A and Landsat-8 satellites now allows for more accurate ultra-high-resolution mapping. While linear land use regression (LUR) models have been used, machine learning methods, particularly Random Forest (RF), offer better performance and robustness. This study aims to develop an ultra-high-resolution PM2.5 distribution model using multi-source data and RF, integrating it with high-resolution population activity data to calculate personal internal doses, considering indoor/outdoor exposure differences, and assess the impact on mortality burden.
Literature Review
Numerous studies have established the link between PM2.5 exposure and adverse health outcomes, including premature mortality from cardiovascular and respiratory diseases and cancer. Early exposure assessments relied on ambient PM2.5 levels measured at monitoring stations, providing a coarse representation of exposure across a city or region. As air quality improved in many areas, the limitations of this approach became apparent, as local sources began to play a more significant role in determining individual exposure. The development and application of chemical transport models (CTMs) improved spatial resolution, allowing for more nuanced exposure assessments, but limitations in resolution and computational costs persist. Land use regression (LUR) models have been widely used to estimate spatially varying concentrations of pollutants. However, the assumption of linearity often limits their accuracy, and the spatial scale of the predictors is frequently a source of uncertainty. The increased availability of high-resolution remote sensing data and advancements in machine learning algorithms have presented opportunities to overcome these limitations. Studies incorporating satellite data, for example, showed improved mapping of PM2.5, but the temporal resolution and potential for cloud cover remain concerns. Machine learning techniques, particularly RF, have been shown to offer enhanced predictive power and robustness compared to traditional regression models. While some studies have attempted to account for indoor/outdoor exposure differences, a comprehensive assessment integrating multiple data sources and considering high-resolution population activity data remains a challenge.
Methodology
This study utilized a four-step approach to estimate personal PM2.5 exposure in Beijing in 2019. First, a chemical transport model (WRF-CMAQ) was used to simulate PM2.5 concentrations at a 1.33 km resolution. The WRF (Weather Research and Forecasting) model provided meteorological inputs, while the CMAQ (Community Multiscale Air Quality) model simulated pollutant concentrations. Second, ultra-high-resolution PM2.5 mapping was achieved by assimilating the WRF-CMAQ output with multi-source auxiliary data using a Random Forest (RF) model. The auxiliary data included 30-m resolution land use type data, satellite-derived top-of-atmosphere (TOA) reflectance data from Landsat-8 and Sentinel-2A, point of interest (POI) data, building location data, population distribution data from Baidu Smart Eye (interpolated to 30m using inverse distance weighting), traffic emission data, and meteorological parameters. The RF model was trained using PM2.5 measurements from 34 monitoring stations in Beijing and tested using 10-fold cross-validation. Third, personal exposure was estimated by combining the 30-m PM2.5 concentration with high-resolution population activity data from Baidu Smart Eye, distinguishing between weekday and weekend patterns. Indoor PM2.5 concentrations were estimated using an indoor/outdoor (I/O) ratio derived from literature and applied to the outdoor concentrations based on land use type. Finally, the estimated personal daily PM2.5 internal dose was used in a health risk assessment model (GEMM) to estimate the mortality burden attributed to PM2.5 exposure, comparing results from the original WRF-CMAQ model (1.33 km resolution) and the high-resolution (30m) assimilated model. Sensitivity analysis was performed to assess the impact of variations in the I/O ratio and population distribution on the estimated internal dose.
Key Findings
The 30-m resolution PM2.5 mapping, achieved through the assimilation of multiple auxiliary predictors using an RF model, demonstrated significantly higher accuracy (R²=0.78-0.82) compared to the WRF-CMAQ model alone (R²=0.31-0.64). The assimilation corrected overestimation in southeastern Beijing and underestimation in the northwest. The average indoor PM2.5 concentration was estimated to be 26.5 µg/m³. Population-weighted ambient PM2.5 concentrations showed variation among districts, but this did not fully reflect the exposure differences among individuals due to varying activity patterns. A significant difference was observed in population-weighted PM2.5 concentrations between the WRF-CMAQ output and the high-resolution assimilation results. Analysis of the 30-m resolution population distribution revealed clear differences in population density between weekdays and weekends, highlighting the importance of considering temporal variations in exposure. The age-standardized daily PM2.5 internal dose was estimated to be 512.9 µg/d, which is significantly lower than values obtained using the ambient WRF-CMAQ model (568.2 µg/d) and the ambient assimilation results (594.5 µg/d), which shows the significance of accounting for indoor-outdoor exposure differences. The estimated annual mortality burden for four specific health endpoints (IHD, stroke, COPD, LC) increased by 24% when using the high-resolution assimilated PM2.5 data compared to the coarser WRF-CMAQ results, suggesting the importance of high-resolution exposure data for mortality burden assessments. Sensitivity analysis showed a sublinear relationship between I/O ratio and total internal dose, with only minor changes in the estimated internal dose resulting from relatively large variation in I/O ratios and population distributions.
Discussion
The findings of this study demonstrate the substantial improvement in accuracy and spatial resolution achieved by integrating multiple data sources and machine learning algorithms for mapping ambient PM2.5 concentrations. The higher accuracy of the 30-m resolution map compared to the 1.33 km resolution WRF-CMAQ model highlights the limitations of coarser resolution models in capturing local variations in PM2.5 levels caused by heterogeneous emissions and atmospheric dispersion patterns within the urban environment. The significant increase in the estimated mortality burden attributed to PM2.5 exposure when using the high-resolution data underlines the importance of accounting for the finer-scale variations in exposure to accurately assess the population health risks of air pollution. Considering both indoor and outdoor PM2.5 exposure through the incorporation of an I/O ratio refined the estimation of the internal dose, indicating that indoor air quality is a crucial factor in overall exposure assessment. This study also underscored the importance of considering temporal variation in population activity patterns, given the impact of this variation on exposure assessment. These findings reinforce the value of high-resolution exposure models in health impact assessments and policy-making, as it allows for a more accurate evaluation of local source contributions to air pollution and identification of areas requiring focused mitigation efforts. The use of high-resolution population activity patterns is also important in assessing individual and population-level risks.
Conclusion
This study presents a novel method for ultra-high-resolution PM2.5 mapping in Beijing, combining multi-source data and an RF model. The resulting 30-m resolution map significantly improved accuracy compared to CTM outputs, enabling more precise personal exposure assessments, considering indoor/outdoor differences. The estimated mortality burden attributed to PM2.5 increased substantially with the higher resolution data. Future research should focus on refining population activity data, expanding the model to include more complex I/O ratio representations, and exploring the applicability of this approach to other urban settings. This method provides a valuable framework for improving air pollution health assessments and informing more effective mitigation strategies.
Limitations
Several limitations exist. Uncertainties inherent in WRF-CMAQ modeling and RF regression influence results. Assumptions about heat maps accurately representing population activity patterns may underestimate non-permanent residents' exposure. The lack of age-stratified activity data introduces uncertainty in assessing age-specific exposure. A unified I/O ratio, neglecting air cleaners and complex indoor sources, may underestimate indoor PM2.5. The I/O ratio approach does not account for the impact of meteorological factors on PM2.5 transport. This study only provides daily personal indoor PM2.5 internal doses in the downtown area.
Related Publications
Explore these studies to deepen your understanding of the subject.