Environmental Studies and Forestry
Spatial calibration and PM2.5 mapping of low-cost air quality sensors
H. Chu, M. Z. Ali, et al.
This study from Hone-Jay Chu, Muhammad Zeeshan Ali, and Yu-Chen He presents an innovative spatial calibration and mapping method for low-cost PM2.5 sensors, effectively tackling measurement discrepancies in humid conditions. The proposed spatial regression model significantly reduces bias and RMSE, enhancing air quality monitoring for communities and agencies.
~3 min • Beginner • English
Introduction
Air pollution, particularly fine particulate matter (PM2.5), is a major urban health concern. Regulatory air quality monitoring networks are sparse due to high costs, limiting spatial resolution. Low-cost sensors provide high-density, real-time data and can supplement regulatory monitors, but prior studies report inconsistencies with reference instruments, especially under high relative humidity, temperature variations, and aerosol composition effects. Biases can arise from sensor type, light-scattering dependencies, and hygroscopic growth, often leading to overestimation at high humidity. Calibration is essential before, during, and after deployment. While linear and machine learning models have been used, maintaining well-calibrated regional networks and reducing errors across large-scale deployments remain challenging. This study proposes a real-time, regional, spatial calibration procedure based on reference-grade regulatory measurements to address spatial heterogeneity. Objectives: (1) develop spatially varying relationships between low-cost sensors and regulatory stations for calibration; (2) calibrate regional low-cost sensors against regulatory stations at a single time slice; (3) estimate reliable PM2.5 maps from calibrated low-cost sensors to identify pollution hotspots.
Literature Review
Prior work highlights performance issues of low-cost PM sensors compared with Federal Reference/Equivalent Methods, with biases related to environmental conditions (e.g., high relative humidity), aerosol characteristics, and temperature. Field and laboratory studies show dependence on particle size, shape, composition, and RH, with near-exponential bias increases at RH >80–85%. Calibration approaches range from simple linear regressions to nonlinear and machine learning models (e.g., random forests) to improve accuracy. However, keeping distributed sensors calibrated in the field is difficult; multivariate models can struggle to reduce large-scale measurement errors and may overfit small datasets. Few studies implement spatially explicit, real-time calibration across a region, motivating a spatially varying (location-specific) calibration framework.
Methodology
Study area and data: Taiwan hosts a dense low-cost network (AirBox, PMS5003-based, ~2963 sensors) providing ~5-minute PM2.5, temperature, and relative humidity, and 76 Taiwanese EPA (TWEPA) regulatory stations with hourly PM2.5. Data sources: AirBox (https://pm25.lass-net.org/) and TWEPA (https://opendata.epa.gov.tw/Data/Contents/ATM00625/). Case time slice: 12:00 UTC, 2020-02-24, characterized by high RH and poor air quality in western Taiwan due to weak dispersion. Average RH across low-cost sensors was 82.1% and temperature 22 °C. Temperatures between 17–27 °C were mostly insignificant for sensor bias in this setting.
Preprocessing and collocation: Each regulatory station observation was paired with its nearest low-cost sensor (nearest neighbor strategy). The overall average low-cost-to-regulatory nearest-neighbor distance was ~610 m; after excluding 7 pairs >2 km, average distance decreased to ~311 m.
Calibration models: (1) Nonspatial (global) linear regression: Yi = β0 + β1 xi1, where Yi is regulatory PM2.5 and xi1 is the collocated low-cost PM2.5. A single coefficient set (β0, β1) applies to all sensors. (2) Spatial calibration model: a kernel-based varying-coefficient spatial regression where coefficients vary with location (ui, vi): E(Yi|Xi,(ui,vi)) = β0(ui,vi) + β1(ui,vi) xi1. Coefficients are estimated via weighted least squares using a Gaussian distance-decay kernel Wij = exp(-(dij^2)/(b^2)), with Euclidean distance dij and bandwidth b selected by cross-validation. Residuals are ei = Yi − Ŷi. Model performance metrics include R2 and RMSE at regulatory stations.
Spatial mapping: After calibration, PM2.5 fields were interpolated using inverse distance weighting (IDW) with power = 2 to a 2 km grid to produce regional maps. Computational performance: end-to-end calibration completes within ~5 minutes on an Intel Core i5-10210U, enabling hourly real-time operation aligned with regulatory updates (low-cost sensors update every ~5 minutes; regulatory hourly).
Key Findings
- Data characteristics at the case time: Low-cost sensors averaged 51.9 µg/m³ (SD 26.1), regulatory stations averaged 36.7 µg/m³ (SD 17.5); variances 681 vs 306 (µg/m³)^2, respectively. Collocated correlation was 0.8. Low-cost data showed higher means and variance, indicating overestimation and need for calibration in high RH conditions.
- Model performance at stations: Nonspatial calibration R2 = 0.64; spatial calibration R2 = 0.94. RMSE reduced from 17.7 µg/m³ (raw low-cost) to 10.5 µg/m³ (nonspatial) and further to 4.1 µg/m³ (spatial).
- Mapping performance: Mapping RMSE = 7.9 µg/m³ (nonspatial) vs 4.8 µg/m³ (spatial), a 39% reduction relative to the nonspatial model.
- Coefficients: Spatial model average slope ≈ 0.33 and intercept ≈ 21.4; nonspatial model slope ≈ 0.58 and intercept ≈ 9.4, indicating substantial spatial non-stationarity.
- Residual analysis: Residual range narrowed from −16 to 36 µg/m³ (nonspatial) to −11 to 14 µg/m³ (spatial). Spatial model residuals were closer to normal with minimal spatial structure, indicating better fit.
- Spatial patterns: After spatial calibration, PM2.5 maps from low-cost sensors aligned with regulatory-station-derived patterns while preserving spatial detail; hotspots were identified in central and southwestern Taiwan. Nonspatial calibration underestimated hotspots in these regions.
- Relative humidity covariate: Including RH in the spatial model increased RMSE to 5.4 µg/m³, suggesting limited incremental benefit in this time slice due to uniformly high RH (avg 82.1%).
Discussion
The study demonstrates that a spatially varying calibration effectively addresses the research question of reducing biases and inconsistencies in low-cost PM2.5 sensor networks, particularly under high relative humidity. By modeling local relationships between collocated low-cost and regulatory measurements via kernel-weighted regression, the approach captures spatial heterogeneity that global linear models miss, substantially improving accuracy (R2 up to 0.94; station RMSE 4.1 µg/m³; mapping RMSE 4.8 µg/m³). The improved residual behavior and alignment of calibrated low-cost sensor maps with regulatory patterns indicate that spatial calibration mitigates systematic overestimation and yields reliable hotspot detection. Operationally, nearest-neighbor collocation is feasible due to the high sensor density (average ~311–610 m proximity). Real-time deployment is practical within hourly cycles given computation time (<5 minutes) and data update frequencies (sensor 5-minute vs regulatory hourly). Incorporating RH as an explicit covariate did not enhance performance for the selected time slice because RH was high and spatially widespread; the spatial model’s local weighting already accommodated much of the heterogeneity. Overall, the method strengthens the utility of low-cost sensor networks for fine-scale air quality assessment by harmonizing them with reference-grade measurements in real time.
Conclusion
This work introduces a real-time, regional spatial calibration and mapping framework for low-cost PM2.5 sensors that leverages spatial regression and IDW interpolation. The method effectively adjusts low-cost sensor measurements to match regulatory-grade observations, achieving R2 = 0.94 and reducing mapping RMSE to 4.8 µg/m³, about 39% of the nonspatial model’s error. Calibrated maps revealed reliable PM2.5 hotspots in central and southwestern Taiwan, offering improved spatial detail over sparse regulatory networks. The approach is computationally efficient and suitable for both real-time and offline applications. Future research will incorporate additional spatial information (e.g., elevation) into the kernel function and extend offline calibration using long-term monitoring data.
Limitations
- Collocation constraints: Strict co-location is challenging; nearest-neighbor pairing may introduce spatial mismatch despite high network density.
- Generalizability: Results are from a single time slice with uniformly high relative humidity; performance may vary across seasons, meteorology, and aerosol regimes.
- Model assumptions: Spatial calibration assumes space-time homogeneity within the calibration window; this assumption is difficult to verify and may be violated.
- Overfitting risk: Small calibration datasets can lead to overfitting in flexible models; care is needed in bandwidth selection and validation.
- Temporal synchronization: Different update rates (5-minute for sensors vs hourly for regulatory stations) limit calibration to hourly intervals.
- Environmental covariates: While RH did not improve performance in this case, omission of other covariates (e.g., aerosol composition) could limit transferability across conditions.
Related Publications
Explore these studies to deepen your understanding of the subject.

