logo
ResearchBunny Logo
Large language models reveal big disparities in current wildfire research

Environmental Studies and Forestry

Large language models reveal big disparities in current wildfire research

Z. Lin, A. Chen, et al.

Dive into groundbreaking research by Zhengyang Lin and colleagues that uncovers surprising geographical and thematic trends in wildfire studies. Despite the burning issues in regions like Siberia and Africa, the spotlight shines disproportionately on the Western United States. Discover the crucial need for collaboration and AI-driven insights in sustainable wildfire management!... show more
Introduction

Wildfires are increasingly severe and frequent, with rising risks to ecosystems and society. A Web of Science search returned over 100,000 wildfire-related papers, with publication rates increasing more than fourfold in the last two decades and contributions spanning multiple disciplines. Despite this growth, cross-disciplinary communication remains limited, prompting the need to understand how research is distributed across wildfire topics and regions to identify gaps and set priorities. The expanding literature volume challenges traditional expert-based syntheses, motivating the use of AI and, specifically, large language models (LLMs) that can process and extract information at scale. This study asks: (1) What preferences and trends have emerged in wildfire research over recent decades? (2) How do spatio-temporal variations in research paradigms manifest? (3) What are the implications of disparities between research focus and actual wildfire activity for populations and socioeconomic development?

Literature Review

The paper situates its approach within the shift from traditional literature reviews and meta-analyses, which typically handle hundreds of papers, to AI-based methods capable of large-scale analysis. Prior deep learning and NLP methods (e.g., BERT) have been used, but recent LLMs (e.g., ChatGPT/gpt-3.5-turbo) demonstrate strong performance in inference and question answering, and have “emergent abilities.” These properties make LLMs promising for tasks such as textual geographic entity recognition (geoparsing) and thematic classification across vast corpora. The study references evidence of LLM performance comparable or superior to conventional NLP in certain tasks and highlights the growing use of AI for synthesizing large bodies of scientific literature.

Methodology
  • Corpus assembly: Queried Web of Science for wildfire-related literature (temporal range 1900–Feb 13, 2023; keywords in Supplementary Table 1). After deduplication, 103,720 peer-reviewed papers and conference reports were identified.
  • Relevance filtering and validation: Implemented gpt-3.5-turbo (OpenAI API) without fine-tuning. Using 1,569 independently screened items (899 relevant by human judgment), cross-validation against human-supervised classification achieved an average F1 score of 0.85 (bootstrapping with 1000 repetitions). Depending on prompt variants, 72,352–80,297 papers were initially labeled as wildfire-related.
  • Information extraction via prompting: Designed and refined prompts to extract major/minor disciplines, study area, study period, fire stage (whole process/pre-fire/actively burning/post-fire), article type, and other key abstract-derived attributes. Medical-relevant papers were excluded per prompt rules.
  • Geoparsing: Used LLM to extract geographic entities from titles/abstracts and convert them to coordinates at 1° spatial resolution, returning maximum outer boundaries (MOB) or central points (CP) and ISO-3166 country codes where applicable. Valid geoparsing yielded 60,488 articles for analysis.
  • Topic/thematic categorization: Classified papers into topics including vegetation, zoological, atmospheric, climate change, ecological, environmental, anthropogenic, hydrological, modeling, remote sensing, site-level observation, and soil.
  • External datasets and harmonization: Integrated satellite- and gridded products for comparison with publication patterns: AVHRR-LTDR burned area (1982–2018) at 0.05°, GFED4.1s fire emissions (1997–2021), GPWv4 population counts (2000–2020, 5-year intervals), and gridded GDP (1990–2015). Datasets were resampled/reprojected to align with publication maps.
  • Imbalance assessment: Compared spatial distributions of publications to burned area and emissions using percentile thresholds. For publications, 50th and 90th percentiles corresponded to multi-year averages of 19.5 and 31.6 papers per year. For burned area, applied a 90th percentile threshold equivalent to 554.3 ha yr−1 and required at least 90% temporal coverage within AVHRR-LTDR. Pixels were grouped into imbalance levels (1–6) to indicate relative under/over-representation.
  • Analytical focus: Assessed temporal trends in topics, spatial concentration of research, author-affiliation distributions by country income level, disparities across biomes, and ignition/climate/vegetation attention within the fire triangle framework.
Key Findings
  • Scale and coverage: 60,488 geoparsed wildfire-related articles (1980–2022) were analyzed from an initial corpus exceeding 100,000.
  • Topic prevalence: “Vegetation” was the most frequent topic (47% of papers); within it, forest fires accounted for 72%.
  • Temporal trends: Publications increased rapidly over the past two decades, comparable to climate-change literature growth. Fast-growing themes since the 1990s include hydrological and atmospheric impacts; post-2000, remote sensing interest surged (e.g., MODIS era). In the most recent decade, climate change and anthropogenic influences gained prominence.
  • Geographic disparities: Western United States received 15% of publications but accounted for <0.5% of global burned area. Regions like Siberia and Africa are underrepresented relative to their burned area and fire emissions.
  • Author-country concentration: 87.1% of contributions came from the top 20 countries by publication count, over half of which are high-income. Of 732 wildfire dataset papers, 517 (70.4%) originated from high-income countries; <10% came from lower-middle income countries, reflecting resource inequality.
  • Biome-level mismatch: Grasslands/savannas in Africa and northern Australia host 72% of global burned area but only 8% of the global population; regions with medium and low disparity levels host 14% and 69% of the population with 8% and 18% of burned area, respectively.
  • Population, GDP, and emissions exposure: 39% of the population and 24% of socioeconomic development (GDP) are in regions with high imbalance levels, encompassing 77% of fire-induced carbon emissions. Conversely, 41% of the population and 48% of GDP are in low-imbalance regions with only 5% of emissions.
  • Fire triangle attention shifts: Research on vegetation (fuel) dominates overall; attention to climate factors has risen with changing fire weather. Human-caused ignitions increasingly dominate research in South America, Europe, and parts of Asia; in the Amazon and Southeast Asia, land-clearing fires are common and can escape control.
  • Inequities and implications: Publication and data-generation biases align with economic capacity, potentially hampering monitoring, policy, and management in under-resourced regions.
Discussion

The study shows that contemporary wildfire research is unevenly distributed across topics and geographies relative to where fires occur and where emissions originate. By mapping publications against burned area, emissions, population, and GDP, the analysis reveals systematic underrepresentation of highly burned regions such as Africa and Siberia and overrepresentation of wealthier, often less-burned regions like the western United States. These disparities are linked to resource inequalities that affect the ability to fund research, develop datasets, and implement monitoring and management systems. The thematic dynamics indicate growing attention to climate drivers and anthropogenic influences, consistent with warming-induced changes in fire weather and expanding human fire use/ignition. The fire triangle framing clarifies imbalances among fuel, climate, and ignition research emphases across regions. The findings address the core research questions by (1) documenting prevailing preferences/trends, (2) exposing spatio-temporal shifts in paradigms, and (3) quantifying the real-world implications of research imbalances for populations and socioeconomic development. The results underscore the need for AI-aided, transdisciplinary collaborations and better support for underrepresented regions to inform sustainable wildfire management under rapid global change.

Conclusion

Using an AI-aided, LLM-based workflow, the study efficiently tracked research trends and quantified disparities across tens of thousands of wildfire papers. Pronounced imbalances exist between where wildfires occur (and emit) and where research attention is focused, notably underrepresenting Africa, the Amazon, and Siberia while concentrating on high-income regions such as the western United States. These gaps weaken our understanding of fire’s ecological and societal roles and hinder the development of effective mitigation and adaptation strategies amid expected increases in fire risk. The authors call for expanded, AI-supported transdisciplinary efforts and resource investments to rectify geographic and thematic imbalances and to enhance resilience in less-developed, high-risk regions.

Limitations

The authors note that their analysis may not fully capture the complexity and scale of imbalances due to diverse regional fire regimes and impacts. For example, a large share of Africa’s burned area arises from routine seasonal burning in savannas and croplands, while high-latitude forests act as critical carbon sinks vulnerable to severe fires with disproportionate carbon-cycle impacts. Additionally, reliance on bibliographic abstracts/titles for geoparsing at 1° resolution and topic inference may introduce classification and spatial uncertainties, despite validation efforts. Publication and dataset availability biases driven by global funding inequalities also constrain representativeness.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny