Introduction
Air pollution, particularly PM2.5, poses a significant global health problem, especially in developing nations. Traditional monitoring networks are often sparse due to cost, prompting the use of low-cost sensors for improved spatial and temporal resolution. These sensors offer advantages like real-time data, ease of deployment and maintenance, and cost-effectiveness. However, inconsistencies exist between low-cost sensor readings and reference-grade measurements from regulatory stations. These discrepancies are often influenced by factors such as aerosol schemes, temperature, and high relative humidity, leading to bias and uncertainty in the data. Calibration is crucial to mitigate these issues and ensure the reliability of low-cost sensor data. Previous calibration methods, including linear models and machine learning techniques, have limitations, particularly when dealing with large-scale datasets and spatial variability. This study aims to develop a real-time, regional, and simple spatial calibration method for low-cost PM2.5 sensors using reference-grade measurements from regulatory stations. The goals are to establish a spatially varying relationship between low-cost sensors and regulatory stations, calibrate regional low-cost sensors simultaneously, and generate reliable PM2.5 concentration maps.
Literature Review
Numerous studies highlight the inconsistencies between low-cost and regulatory air quality sensors. These discrepancies arise from various factors including sensor limitations, environmental conditions (temperature, humidity), and the inherent characteristics of different sensor types. Previous research has explored various calibration techniques, such as linear models and machine learning approaches, but these methods often struggle with the complexity of spatial variability and large-scale datasets. Some studies have focused on field calibration formulas and multivariate calibration models, but these methods can still have limitations in reducing measurement errors, particularly in high relative humidity conditions. The need for a robust spatial calibration approach that addresses these challenges has been widely acknowledged in the literature.
Methodology
This study utilized data from 2963 AirBox low-cost PM2.5 sensors and 76 Taiwanese Environmental Protection Agency (TWEPA) regulatory stations across Taiwan. The hourly PM2.5 data from TWEPA stations, compliant with regulatory monitoring procedures, and the real-time data from AirBox sensors (updated roughly every 5 minutes) were used. A specific time slice (12:00 pm UTC, February 24, 2020) was selected as a case study, characterized by high relative humidity and poor air quality in western Taiwan. The methodology involved several key steps. First, data preprocessing was performed, including data collocation using a nearest neighbor approach to match low-cost sensors with their closest TWEPA station. Both nonspatial and spatial calibration models were then applied. The nonspatial model used a simple linear regression, while the spatial calibration model employed a kernel-based varying-coefficient model (spatial regression). The spatial model incorporated the 2D coordinates of each observation, allowing for spatially varying regression coefficients. These coefficients were estimated using a weighted least-squares method, with weights determined by the distance between the observations using a Gaussian kernel function. The bandwidth for the Gaussian kernel was optimized using cross-validation. Model performance was evaluated using R-squared and RMSE. Finally, inverse distance weighting (IDW) interpolation was used to generate a 2 km-resolution PM2.5 concentration map from the calibrated low-cost sensor data. The spatial calibration model accounts for spatially varying relationships between low-cost sensor readings and TWEPA station data. It is a local linear transformation that addresses the limitations of the nonspatial global linear transformation, which is susceptible to biases due to outliers.
Key Findings
The study revealed significant differences between raw data from low-cost sensors and regulatory stations. The average PM2.5 concentration from low-cost sensors was more than double that of the regulatory stations, with a larger variance. The spatial calibration model significantly outperformed the nonspatial model. The nonspatial model had an R-squared value of 0.64 and reduced the RMSE from 17.7 µg/m³ to 10.5 µg/m³. In contrast, the spatial model achieved an R-squared of 0.94, and reduced the RMSE to 4.1 µg/m³. The spatial calibration model resulted in a data distribution of calibrated low-cost sensor data more similar to that of the regulatory stations. The spatial coefficients (slope and intercept) varied significantly across locations, highlighting the importance of spatial heterogeneity in the calibration process. Analysis of the model residuals showed that the spatial model yielded smaller and more normally distributed residuals compared to the nonspatial model. In mapping PM2.5 concentrations, the nonspatial and spatial models had RMSEs of 7.9 µg/m³ and 4.8 µg/m³, respectively, demonstrating a 39% improvement from the spatial model. Spatial mapping using the spatial calibration model showed results consistent with those from the regulatory stations, particularly in identifying pollution hotspots in central and southwestern Taiwan. The average nearest neighbor distance between low-cost sensors and TWEPA stations was approximately 610m (311m after excluding distances over 2km), indicating sufficient data for effective calibration. The analysis showed that relative humidity had a negligible association with the spatial calibration results given uniform high humidity across the region. The computational time for the spatial calibration model was minimal (<5 min).
Discussion
The findings demonstrate the effectiveness of the proposed spatial calibration approach for improving the accuracy and reliability of low-cost PM2.5 sensor data. The superior performance of the spatial model over the nonspatial model underscores the importance of considering spatial heterogeneity in calibrating low-cost sensors. The ability to generate accurate PM2.5 maps from calibrated low-cost sensor data provides valuable spatial information on air pollution, exceeding the information provided by sparse regulatory networks. This has significant implications for air quality monitoring, public health, and environmental policy. The consistency between pollution hotspots identified using calibrated low-cost sensors and regulatory data validates the proposed methodology. The negligible effect of relative humidity on the spatial model, in this case of uniformly high humidity, simplifies calibration, but this might not hold true under other conditions. The study highlights the potential of low-cost sensors, when properly calibrated, to provide high-resolution air quality data for effective air pollution management and public health initiatives.
Conclusion
This study presents a novel spatial calibration and mapping approach for low-cost PM2.5 sensors, effectively addressing challenges posed by spatial heterogeneity and high relative humidity. The spatial regression model significantly improves accuracy compared to nonspatial methods. The resulting high-resolution PM2.5 maps provide valuable insights into air pollution patterns, surpassing information from limited regulatory stations. Future work could incorporate additional spatial variables (elevation) into the kernel function and explore the application of the model in diverse environmental conditions and with various sensor types.
Limitations
The study focuses on a specific time slice with uniformly high relative humidity. The generalizability of the findings to other time periods or regions with varying humidity conditions needs further investigation. The nearest neighbor approach for data collocation might introduce some uncertainties due to variations in the distance between sensors and regulatory stations. The study uses a specific type of low-cost sensor (AirBox), thus the findings' applicability to other low-cost sensor types should be explored further.
Related Publications
Explore these studies to deepen your understanding of the subject.