Earth Sciences
Machine learning-based observation-constrained projections reveal elevated global socioeconomic risks from wildfire
Y. Yu, J. Mao, et al.
This groundbreaking study by Yan Yu and colleagues reveals vital insights into wildfire projections and socioeconomic risks. By utilizing a machine learning framework, the research highlights the urgent need for strategic wildfire preparedness, particularly in western and central Africa, where future wildfire activity is expected to rise amid increasing socioeconomic development.
~3 min • Beginner • English
Introduction
Wildfires are major disturbances influencing the global carbon budget, climate, and human systems. Recent extreme fire seasons (e.g., 2019–2020 Australia) underscore escalating social and economic impacts. While Earth system models (ESMs) can represent human–vegetation–fire–climate interactions, substantial uncertainties and biases persist in CMIP6 simulations of historical fire carbon emissions, undermining confidence in default future projections. Traditional approaches often use fire weather as a proxy, but fire activity depends on multiple factors (terrain, fuels, moisture, ignition sources) and complex interactions. Emergent constraint (EC) methods reduce uncertainty for some Earth system projections but need many diverse models and typically assume linear relations, which are insufficient for local-scale wildfire projections and nonlinear fire drivers. The study asks whether a machine learning-based, observation-constrained framework can leverage historical observed joint states of fire-relevant climate, ecosystem, and socioeconomic variables together with ESM history–future relationships to reduce spatial inaccuracies in projected wildfire carbon emissions and associated socioeconomic risks globally.
Literature Review
Prior work highlights: (1) ESMs as tools to simulate coupled fire–climate–ecosystem dynamics but with significant uncertainty in fire modules and biases in historical fire emissions (CMIP6). (2) Use of fire weather indices from models as emulators of fire potential, though real fire activity is also governed by fuels, moisture, ignition, and terrain. (3) Emergent constraint frameworks have successfully constrained large-scale climate/ecosystem metrics using linear across-model relationships, but limitations include small model sample sizes for fire, linearity assumptions, and poor suitability for local-scale, spatially detailed wildfire projections. (4) Alternative performance-based constraints (bias-correction, model weighting) and process-oriented regressions help but often rely on univariate, temporally stable, linear assumptions that are ill-suited to nonlinear fire processes. Recent machine learning applications have shown promise for uncovering nonlinear drivers and forecasting burned area/emissions regionally (e.g., Africa), motivating a more advanced, multivariate, nonlinear, observation-constrained approach for global wildfire projections.
Methodology
The study develops a machine learning-based observational constraining framework to project future fire carbon emissions and wildfire-related socioeconomic exposure. Data: 38 ensemble members from 13 CMIP6 ESMs providing historical and future fire carbon emissions (primarily SSP5-8.5; subset analyzed for SSP2-4.5). Predictors comprise historical observed fire-relevant variables (Supplementary Table 2): fire carbon emission, LAI, soil moisture, temperature, precipitation, wind, relative humidity, lightning flash rate, orography, and socioeconomic drivers (land use, population). These capture fuel abundance (LAI, temperature, precipitation), fuel moisture (soil moisture, humidity, precipitation, temperature), spread conditions (wind, orography), and ignitions (lightning, land use, population). Processing: All ESM outputs are bilinearly interpolated to 0.25°×0.25° to match observation-based fire emission products, yielding 11,325 spatial samples per model. For atmospheric/terrestrial predictors, both annual means and monthly climatologies (12 months) are used. Training and prediction are decadal: for each target decade (2011–2020, 2021–2030, …, 2091–2100), the ML models learn relations between historical (2001–2010) predictors and future decadal fire emissions using the full spatial fields from each ESM and the multimodel aggregate, enlarging effective sample size by spatial sampling. Machine learning: three algorithms—random forest (rf), support vector machine with RBF kernel (svmRadialCost), and gradient boosting machine (gbm)—are trained using 10-fold cross-validation (n≈10,193 training, ≈1,132 validation per fold) with hyperparameter tuning (svm: grid over C and sigma; gbm: interaction.depth=3, shrinkage=0.2, trees 10–200; rf: mtry 5–50; trees per package defaults). Optimized models achieve cross-validated R²>0.8 across future periods. Observational constraint: feed observational predictors into trained ML models to generate observation-constrained projections of decadal fire carbon emissions, forming a multimodel, multi-dataset ensemble. Validation: assess against observed fire emissions for 2007–2016 using RMSE and spatial R²; compare to unconstrained ESM ensemble and to a traditional EC (linear) constraint. Sensitivity tests examine training resolution effects (1°, 2.5°, 5°, 10°) and application to burned area fraction in six ESMs. Socioeconomic risk quantification: define exposure metrics as products of decadal mean fire carbon emissions and co-located population, GDP (PPP, 2005 USD), and agricultural area under SSP5-8.5, resampled to 0.25°. Compute relative trends (% decade⁻¹) versus 2010s baseline for default and constrained ensembles. Mechanistic interpretation: use ML variable importance to evaluate (1) historical drivers’ contributions to future spatial patterns and (2) dynamic contributions of projected trends in drivers (environmental and socioeconomic) to projected wildfire trends in targeted regions and land cover types; environmental drivers are also observation-constrained for self-consistent analysis. Data and code availability are via ESGF nodes and upon request to the corresponding author.
Key Findings
Historical validation (2007–2016): The observation-constrained product markedly improves agreement with observations in magnitude and spatial patterns. It reduces overestimation over sparsely vegetated regions, tropical rainforests, northern boreal areas, and densely populated regions in North America/Europe, and mitigates underestimation over African savannahs. RMSE between simulated multimodel mean and observed annual total fire carbon emissions drops from 0.020 to 0.014 (0.010–0.017) kg m⁻² yr⁻¹; spatial R² for decadal mean emissions increases from 0.36 to 0.66 (0.47–0.92) (all p<0.001). The ML-based constraint outperforms traditional linear EC. Individual ESMs show RMSE reductions of 46% (NorESM2-LM) to 74% (MRI-ESM2.0) and R² increases by 0.30 (E3SM-1.1) to 0.56 (EC-Earth3-CC). Future global totals (SSP5-8.5): Default ensemble projects a 6.0% (0.6%–9.4%) decade⁻¹ increase, from 2.7×10³ (1.6×10³–4.7×10³) Tg yr⁻¹ in the 2010s to 4.0×10³ (2.1×10³–1.4×10⁴) Tg yr⁻¹ in the 2090s. Observation-constrained ensemble projects a smaller 4.1% (2.6%–7.2%) decade⁻¹ increase, from 2.0×10³ (1.7×10³–2.4×10³) Tg yr⁻¹ (consistent with observed ≈2.0×10³ Tg yr⁻¹) to 2.8×10³ (2.7×10³–3.4×10³) Tg yr⁻¹ in the 2090s, with a much narrower spread and relatively stable emissions from 2010s to 2050s influenced by CLM5-based models. Spatial patterns: The constrained ensemble projects robust increases over most global land, with limited decreases (e.g., boreal Eurasia, North American Great Lakes). It notably reverses default-projected decreases to increases over West Africa, Congo, northern Australia, and parts of eastern South America; reduces spurious default increases over sparsely vegetated North Africa, Middle East, and Central Asia. Latitudinally, default shows weak increases near the equator (10°S–10°N), whereas constrained shows stronger positive trends; constrained trends in higher northern bands are modest: 40°–50°N 0.6% (0.5%–0.7%), 50°–60°N 0.5% (0.2%–0.9%), 60°–70°N 0.03% (−0.1%–0.6%) decade⁻¹. Socioeconomic exposure (SSP5-8.5): Observation-constrained global wildfire exposure increases by 5.5% (5.0%–6.2%) decade⁻¹ for population, 40.6% (33.7%–48.5%) decade⁻¹ for GDP, and 2.5% (1.9%–3.7%) decade⁻¹ for agricultural area. Default ensemble yields smaller relative increases: 3.2% (1.1%–7.9%), 12.6% (7.0%–28.5%), and 1.8% (0.9%–5.5%) decade⁻¹, respectively, due to higher simulated historical risks. The constrained ensemble indicates especially elevated risks in western and central African countries, with some (e.g., Niger, Sierra Leone) among the most vulnerable. Mechanisms: Constrained projections link increased tropical wildfire activity to drying (soil moisture, relative humidity) and, in regions like the Congo, increasing fuel abundance (e.g., LAI). The framework suggests weaker influence of projected local socioeconomic trends on fire trends than in default ESMs due to the global-spatial training approach. Scenario dependence: Under SSP2-4.5, both default and constrained ensembles show milder increases; regional differences in sign/magnitude emerge, particularly across subtropical/tropical regions and the Appalachian Mountains.
Discussion
The ML-based observational constraint directly addresses the challenge of biased and uncertain ESM wildfire projections by integrating multitype observations with the mechanistic history–future relationships inherent in ESMs. By learning nonlinear, multivariate relations and leveraging full spatial fields across models, it substantially improves historical spatial accuracy and narrows uncertainty in future projections. The constrained results imply that default ESMs overestimate the magnitude and growth of global fire emissions, which would otherwise bias assessments of fire-related feedbacks to climate (carbon cycle, albedo, aerosols/greenhouse gases). Importantly, the constrained spatial patterns yield markedly different risk assessments—especially highlighting western and central Africa where projected increases in wildfire activity coincide with rapid socioeconomic growth, leading to greater exposure of people, GDP, and agriculture than suggested by default models. Physical drivers underpinning these changes include enhanced drying in the tropics and increases in fuel availability, consistent with broader evidence of intensified drought and ecological responses under warming. The findings provide actionable insights for preparedness and adaptation planning in identified hotspots and demonstrate the feasibility of extending observation-constrained, ML-based approaches to other local-impact variables. Scenario analysis indicates that socioeconomic pathways (e.g., SSP2-4.5 vs SSP5-8.5) can substantially alter regional outcomes, underscoring the importance of mitigation choices for future wildfire risk landscapes.
Conclusion
The study introduces and validates a machine learning-based observational constraining framework that corrects spatial biases in CMIP6 wildfire carbon emission projections and yields more credible estimates of future wildfire activity and socioeconomic exposure. Compared to the default ensemble, the observation-constrained projections indicate a smaller global increase in fire carbon emissions—4.1% (2.6%–7.2%) decade⁻¹ vs 6.0% (0.6%–9.4%) decade⁻¹—but higher relative growth in exposure of population (5.5% per decade), GDP (40.6% per decade), and agricultural area (2.5% per decade) during the twenty-first century, driven largely by concurrent increases in wildfire activity and rapid socioeconomic development in western and central Africa. These insights call for targeted mitigation and adaptation strategies (e.g., fuel management, improved monitoring, air quality planning) in identified high-risk regions. The framework is generalizable to other climate and ecosystem variables with strong local impacts and can support more reliable risk assessments. Future work should couple constrained fire emissions dynamically with other Earth system components, incorporate finer-scale topography and vegetation processes, improve representation of human–fire interactions, expand high-resolution observations (e.g., small fires, biomass), and evaluate potential tipping points in fire regimes.
Limitations
- Observational uncertainties: reliance on single datasets for lightning and some socioeconomic variables; reanalysis wind and humidity biases in data-sparse regions; coarse burned-area products may miss many small fires (notably in Africa), potentially underestimating historical and projected emissions. - Variable mismatch: lack of long-term, reliable above-ground biomass observations precludes direct use of key fuel metrics; omission of sub-grid topographic details (slope, aspect, ruggedness) that affect spread and intensity. - Model interdependence: overlapping components (e.g., multiple ESMs using CLM5) reduce ensemble diversity and can influence constraints. - Feedbacks not explicitly represented: the framework constrains fire emissions but does not dynamically couple feedbacks to climate, vegetation, and atmospheric composition; socioeconomic–fire interactions beyond current ESM parameterizations (ignitions, suppression, urbanization, prescribed burning) are not fully captured. - Resolution sensitivity: performance depends on spatial resolution of ESM inputs; finer resolution improves constraint via richer spatial sampling. - Potential nonlinearity/tipping points: the method may not capture abrupt regime shifts or thresholds (e.g., critical fuel moisture) that lead to extreme fire behavior. - Extrapolation risk: evaluated to be minimal given overlap between observational and historical simulation data spaces, but always a consideration for ML-based projection.
Related Publications
Explore these studies to deepen your understanding of the subject.

