Earth Sciences

A machine learning paradigm for necessary observations to reduce uncertainties in aerosol climate forcing

J. Redemann and L. Gao

Uncertainties in quantifying climate cooling due to anthropogenic aerosols can be addressed using an innovative machine learning approach by Jens Redemann and Lan Gao. This study demonstrates the potential of advanced neural networks to enhance aerosol property estimations, fundamentally improving conventional methods and promising greater accuracy in climate modeling.

00:00

~3 min • Beginner • English

Index

Introduction

Anthropogenic aerosols exert a net cooling that partially offsets greenhouse-gas-induced warming, yet uncertainties in aerosol climate forcing have remained large across recent IPCC reports. A key barrier is the limited accuracy, resolution, and coverage of observational constraints on aerosol properties relevant to Earth System Models, particularly vertical distributions and properties near clouds. The study proposes and demonstrates a machine-learning-based paradigm that uses only lidar observables, augmented by reanalysis temperature and relative humidity, to retrieve higher-level aerosol properties not traditionally derivable from lidar alone—specifically aerosol light absorption (ABS) and cloud condensation nuclei (CCN) concentrations. The goal is to provide accurate, vertically resolved ABS and CCN from suborbital and future spaceborne lidars to better constrain aerosol–cloud interactions and reduce forcing uncertainties.

Literature Review

Conventional aerosol remote sensing predominantly relies on physics-based optimal estimation using forward radiative transfer models. While combining lidars with polarimeters promises enhanced information content, major challenges persist: incomplete knowledge of scattering for non-spherical particles and polarized surfaces, sensitivity to a priori assumptions, and high computational cost. New missions (NASA AOS; ESA/JAXA EarthCARE/ATLID) will provide vertically resolved lidar observations crucial for addressing uncertain aerosol–cloud interactions. Prior empirical approaches linking lidar extinction/backscatter to CCN are aerosol-type specific and sensitive to relative humidity, failing across broader conditions due to non-linear hygroscopic growth of optics versus relatively invariant CCN. Recent physics-based satellite CCN retrievals (e.g., from CALIPSO) have uncertainties near a factor of two, and robust ABS retrievals from spaceborne lidars remain scarce. This context motivates machine learning trained on collocated high-accuracy lidar and in situ data to learn multivariate, non-linear relationships, with reanalysis T and RH as physical constraints.

Methodology

Data and observables: Lidar data come from NASA Langley HSRL-2, measuring aerosol backscatter and depolarization at 355/532/1064 nm and aerosol extinction at 355/532 nm. Collocated in situ ABS and CCN measurements were obtained from coordinated aircraft flights during four field campaigns (DISCOVER-AQ 2013–2014, ACTIVATE 2019–2022, ORACLES 2016–2018, CAMP2Ex 2019), covering diverse aerosol types (smoke, dust, marine, urban/pollution, mixtures) and pollution regimes. Collocation criteria: horizontal ≤1100 m, vertical ≤45 m, temporal ≤30 min; HSRL-2 horizontal resolution ≈1.5–2 km (10 s averages), vertical resolution 15 m; in situ at 1 Hz. This yielded 9873 lidar–CCN pairs (CCN mostly at 0.35–0.4% supersaturation) and 2516 lidar–ABS pairs. Data filters removed CCN <10 cm−3 and ABS <0.1 Mm−1 to avoid low-SNR regimes. Instruments: CCN counters (continuous-flow, standard and scanning modes; nominal ≈10% uncertainty at high SNR); ABS from PSAP at 467/530/660 nm with scattering corrections; PSAP uncertainty can be large in clean/variable pressure conditions—data were quality filtered (e.g., excluding periods with aircraft altitude variability >5 m). Reanalysis predictors: ERA5 temperature and relative humidity were bilinearly and temporally interpolated (0.25° grid, 37 vertical levels, hourly) to the lidar locations and added as additional input features. Machine learning models: Fully Connected Neural Networks (FCNN) for regression trained with Levenberg–Marquardt optimization; Random Forests were tested with comparable skill but roughly double training time, so FCNNs were selected. Inputs: lidar observables (HSRL-2 full set or UV-only subset for ATLID experiments) with optional ERA5 T and RH; targets: in situ CCN and ABS. Data split: 70% train, 15% validation, 15% test; 10-fold cross-validation; Bayesian hyperparameter optimization; optimal architectures documented in supplementary materials. To mitigate higher errors in clean regimes for CCN, training was reweighted with a factor of 3 for CCN <100 cm−3; ABS reweighting degraded overall performance and was not used. Spatial autocorrelation checks (removing points within ±5 km) showed only slight performance decreases likely due to reduced sample size rather than autocorrelation. Two predictor configurations: (1) HSRL-2 full information content (backscatter, depolarization at 355/532/1064 nm; extinction at 355/532 nm). (2) Simulated EarthCARE/ATLID configuration using UV-only (355 nm) backscatter, depolarization, and extinction. For ATLID simulations, noise was added to airborne UV observables to emulate spaceborne uncertainties: Gaussian noise in backscatter based on mean relative differences between collocated CALIOP and HSRL-2; extinction uncertainties propagated using literature lidar-ratio uncertainties. This yields a ‘noisier-than-airborne’ UV dataset approximating ATLID error characteristics. Evaluation metrics: correlation coefficient (R), mean absolute error (MAE), mean relative error (MRE), and counts/fractions within ±30% and ±50% error. Density scatter plots used logarithmic binning for visualization. Additional analyses assessed training data completeness across aerosol types and conditions; frequency distributions compared to ARM SGP climatology suggested representativeness with some insufficiencies in certain aerosol-type/clean-condition bins.

Key Findings

HSRL-2 configuration (airborne, full observables): - CCN: Using lidar observables only, R=0.93; MRE=22%; MAE=133 cm−3; 66% within ±30%, 82% within ±50%. Adding ERA5 T and RH improves R=0.97; MRE=13%; MAE=80 cm−3; 85% within ±30%, 93% within ±50% (N=1480 test points). - ABS: Lidar only, R=0.80; MRE=25%; MAE=0.45×10−6 m−1; 65% within ±30%, 83% within ±50%. With reanalysis, R=0.90; MRE=21%; MAE=0.38×10−6 m−1; 73% within ±30%, 84% within ±50% (N=377 test points). Simulated ATLID (UV-only with added noise): - CCN: Lidar only, R=0.57; MRE=51%; MAE=310 cm−3; 32% within ±30%, 52% within ±50%. With reanalysis, R=0.88; MRE=23%; MAE=141 cm−3; 70% within ±30%, 83% within ±50% (N=1280). - ABS: Lidar only, R=0.55; MRE=40%; MAE=0.76×10−6 m−1; 46% within ±30%, 67% within ±50%. With reanalysis, R=0.74; MRE=28%; MAE=0.52×10−6 m−1; 66% within ±30%, 82% within ±50% (N=305). Comparative significance: For CCN, simulated ATLID ML predictions achieve within ±50% in 83% of cases with reanalysis, markedly better than previously reported physics-based satellite approaches (≈factor-of-two uncertainties). For ABS, the ML MAE for simulated ATLID (0.52×10−6 m−1) is more than three times lower than uncertainties implied by propagating plausible SSA uncertainties (≈1.7–2.0×10−6 m−1). The methodology produces detailed vertical CCN structures (e.g., curtain plots) near clouds that are challenging for passive retrievals.

Discussion

The study demonstrates that machine learning trained on high-quality HSRL-2 lidar observables and in situ measurements, augmented by ERA5 temperature and relative humidity, can retrieve higher-level aerosol properties (CCN and ABS) with substantially improved accuracy relative to conventional retrievals. With full HSRL-2 inputs, CCN is retrieved within ±30% in 85% of cases and within ±50% in 93%, enabling unprecedented analysis of aerosol cloud-nucleating properties near clouds and throughout the vertical column. Even with UV-only, noise-added inputs emulating EarthCARE/ATLID, the approach yields useful performance, retrieving CCN within ±50% in 83% of cases and ABS with lower error than expected from physics-based uncertainty propagation. These results directly address the observational gap limiting constraints on aerosol–cloud interactions and forcing in ESMs, providing vertically resolved, near-cloud-capable datasets. While ML does not replace physics-based methods, it can complement them—offering computationally efficient, high-accuracy products in regimes where forward-model uncertainties or computational cost hinder optimal-estimation retrievals. The importance of independent extinction and backscatter (HSRL or Raman capability) suggests that future spaceborne systems with such measurements can unlock global CCN and ABS distributions to confront and improve ESMs.

Conclusion

This work introduces and validates a machine-learning paradigm to retrieve aerosol light absorption and CCN concentrations using only lidar observables and reanalysis (T, RH) as predictors, trained against collocated in situ references. Using airborne HSRL-2 data, the approach achieves unprecedented accuracy for vertically resolved CCN and ABS, and maintains strong performance with UV-only, noise-added inputs approximating EarthCARE/ATLID. The paradigm enables retrievals close to clouds and of higher-level properties not traditionally accessible from lidar. It can be readily adapted to future satellite or suborbital lidars by retraining on the specific observable set. Future research should apply and validate the models with actual ATLID and other mission data, expand training across additional aerosol types and regions, explore hybrid ML–physics approaches for enhanced interpretability and robustness, and assess impacts of these new constraints on ESM aerosol–cloud forcing estimates.

Limitations

- Training data representativeness: Although campaigns cover diverse conditions, most CCN (≈96%) and ABS (≈76%) training data came from a single campaign (ACTIVATE). Some aerosol-type/clean-condition bins remain sparsely sampled (<2% occurrence), potentially degrading performance in those regimes. - Clean-regime performance: Higher relative uncertainties occur at very low CCN (<100 cm−3) and ABS (<1 Mm−1). CCN performance was improved via triple weighting of clean data; ABS reweighting degraded overall accuracy and was not used. - Instrument uncertainties: PSAP ABS exhibits substantial and condition-dependent uncertainties (notably under clean conditions and pressure variability); despite filtering, residual uncertainty propagates to training targets. Threshold filtering (CCN <10 cm−3; ABS <0.1 Mm−1) may bias the lowest-value regimes. - Simulated satellite inputs: ATLID performance estimates rely on simplified noise models derived from CALIOP–HSRL differences and lidar-ratio uncertainty propagation; true ATLID error characteristics may differ. Results are thus indicative bounds pending validation with actual satellite data. - Model dependence on observables: Best performance leverages independent extinction and backscatter (HSRL/Raman). Systems lacking this independence may yield reduced skill. - ML interpretability: The approach retrieves properties effectively but does not directly elucidate physical mechanisms; it is intended to complement, not replace, physics-based retrievals. - Potential sampling autocorrelation was tested and found to have minimal impact; however, reduced sample sizes can degrade performance, highlighting sensitivity to training dataset volume and diversity.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Pre-deployment risk factors for PTSD in active-duty personnel deployed to Afghanistan: a machine-learning approach for analyzing multivariate predictors

K. Schultebraucks, M. Qian, et al.

Medicine and Health

Quantifying disparities in intimate partner violence: a machine learning method to correct for underreporting

D. Shanmugam, K. Hou, et al.

Computer Science

Using the interest theory of rights and Hohfeldian taxonomy to address a gap in machine learning methods for legal document analysis

A. Izzidien

Health and Fitness

A physiological approach for assessing human survivability and liveability to heat in a changing climate

J. Vanos, G. Guzman-echavarria, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny