Political Science
Political instability patterns are obscured by conflict dataset scope conditions, sources, and coding choices
C. Raleigh, R. Kishi, et al.
The paper interrogates how definitions, sourcing practices, and coding choices in global, event-based conflict datasets shape assessments of political violence. It poses practical questions—Which country is most violent? Where are civilians most at risk? Is conflict increasing or decreasing?—and shows that answers depend heavily on the dataset used. The authors argue that conceptualizations of conflict and measurement strategies have drastic implications for reliability and validity. They motivate the inquiry with consequential use-cases across policy, humanitarian response, and academic research, and illustrate stakes with examples like Mexico, where ACLED records thousands of civilian fatalities tied to cartel-related violence while UCDP-GED records only a handful due to inclusion rules that exclude violence by unnamed actors. The introduction sets the purpose: to critically evaluate four public, global conflict event datasets (ACLED, UCDP-GED, ICEWS, GDELT), explain why they diverge, and outline trade-offs between internal reliability and external validity that affect visibility of conflict risks and trends.
The paper situates its contribution within longstanding debates on the use of statistics in public policy and the trade-offs in simplifying complex social phenomena (e.g., Levy 2001; Danzger 1975; Franzosi 1987). It references parallel discussions in social movement and communications research about media biases and event reporting (e.g., Galtung & Ruge 1965; Earl et al. 2004; Harcup & O'Neill 2001, 2016; Krippendorff 2013). Prior comparative efforts often documented empirical differences without interrogating underlying concepts or data-generating processes (e.g., Donnay et al. 2019; Weidmann 2013; Ward et al. 2013; Stundal et al. 2021). The authors note the predominance of UCDP data use in top conflict journals and the relative absence of automated datasets in academic work, potentially shaping research agendas toward state-centric and high-fatality conflicts while neglecting militia violence and non-lethal consequences. They also engage recent calls to address bias and errors in conflict data (Miller et al. 2022; Demarest & Langer 2022), and frameworks like Total Error approaches, arguing these improve reliability but not validity when source corpora are static and biased.
The study conducts a structured comparative analysis of four public, global, event-based datasets: ACLED, UCDP-GED, ICEWS, and GDELT. It defines and applies two evaluative concepts—Internal Reliability (consistency with project rules over time/space) and External Validity (accuracy and representativeness of real-world conflict patterns). It operationalizes two key dimensions that shape these qualities: (1) conflict catchment (inclusion criteria and event definitions) and (2) source catchment (diversity, languages, number/scale, and dynamism of sources). The authors describe data-generating processes (researcher-led vs automated NLP using CAMEO/Goldstein), summarize inclusion rules (e.g., UCDP’s stated incompatibility and 25 battle-related deaths threshold; ACLED’s wider typology including varied actor types and forms of violence), and detail sourcing strategies (ACLED’s multilingual, local partners vs reliance on English-language aggregators in UCDP/ICEWS; automated scraping in GDELT). They then present illustrative case comparisons (Mexico, Philippines, Kenya’s Pokot militias) and sourcing geography examples (Syria 2017) to demonstrate how choices translate into divergent patterns. Tables summarize benefits/trade-offs for conflict and source catchments.
- Automated datasets (ICEWS, GDELT) suffer from fundamental validity problems: keyword-driven inclusion without expert oversight produces high rates of false positives, duplicates, miscoding, centroid geolocation errors, and vulnerability to misinformation/disinformation. Example issues include: thousands of misclassified U.S. “aerial weapon” events in 2019; ICEWS coding of -10 intensity events for mere rhetoric (US–Iran) and misinterpretation of Zimbabwe reporting; widespread duplicate inflation due to syndicated news and narrow de-dup windows. These non-random errors distort trends and hotspots, making reliable baselines and comparisons infeasible.
- Researcher-led datasets diverge due to conflict definitions and inclusion thresholds: • ACLED prioritizes external validity with a wide conflict catchment (including militia violence, anonymous/unidentified armed groups, violence against civilians, remote violence) and dynamic, multilingual, local sourcing, yielding more comprehensive coverage of heterogeneous, evolving conflict forms. • UCDP-GED prioritizes internal reliability with strict inclusion: state-based, non-state, and one-sided categories contingent on an active dyad with stated incompatibility and at least 25 battle-related deaths per year; events must have at least one direct death with specific date/location; sourcing relies heavily on English-language international/national media aggregators.
- These choices drive stark discrepancies: • Mexico 2021: ACLED records 6,739 civilian fatalities (over 81% of violent fatalities), often by unnamed groups tied to cartels; UCDP-GED records 28 civilian fatalities (0.15% of its total fatalities), largely omitting anonymous-perpetrator violence. • Philippines 2020: ACLED records 1,287 events and 1,496 fatalities, including 901 civilian fatalities (>62%); UCDP-GED records 157 events and 374 fatalities, including 37 civilian deaths (~10%). Drug war-related, state-linked vigilante violence is largely excluded by UCDP-GED criteria. • Kenya (Pokot militias): ACLED codes 110+ battle fatalities (1997–2020) with state forces and 480 fatalities from civilian targeting; UCDP-GED omits this conflict due to thresholds and dyad rules, obscuring militia violence.
- Fatality thresholds as inclusion or intensity measures are unreliable due to pervasive reporting bias and medical advances reducing battle deaths; in ACLED 2021, >93,700 political violence events were recorded, >95% with fewer than 10 fatalities and over half with zero fatalities—forms largely missed by thresholded datasets.
- Source catchment critically shapes coverage: English-language international media emphasize high-fatality, high-profile, urban events and exhibit temporal/urban/reporting biases; local/multilingual sources and partnerships are essential to capture peripheral, small-scale, and sensitive violence. Example: In Syria 2017, traditional media under-reported areas and event types that local partners captured (e.g., small battles, civilian targeting, mass arrests). ACLED reports that 58% of its single-source events are in non-English languages, and significant shares of events in countries like Somalia and Guatemala come from unique local networks.
- Trade-offs: Wider conflict and source catchments increase external validity but complicate temporal consistency; static, narrow definitions and sources increase internal reliability but reduce validity and comprehensiveness. Without transparency, users misinterpret trends, and policy risks arise (e.g., UNICEF workflows using GDELT).
The findings demonstrate that dataset construction choices—not underlying conflict dynamics—often determine answers to core policy and research questions about which countries are most violent, where civilians are most at risk, and whether conflict is rising or falling. Automated NLP datasets lack valid inclusion criteria and oversight, producing artificial trends and hotspots, and should not be used to infer conflict patterns. Among researcher-led datasets, strict thresholded definitions (UCDP-GED) privilege state-insurgent warfare and exclude prevalent forms like militia and anonymous perpetrator violence, undercount civilian harm, and misrepresent fragmented conflict landscapes. In contrast, a validity-first approach (ACLED) that adapts definitions and sources to evolving conflict forms, integrates multilingual and local partners, and disaggregates actor types better reflects real-world patterns, albeit with challenges for strict temporal comparability. The study clarifies that reliability and validity involve unavoidable trade-offs; maximizing both requires transparent, dynamic sourcing across contexts and careful communication of priorities and limitations. These insights urge analysts to align dataset choice with research questions, to avoid equating high event counts with higher violence when driven by media attention, and to anticipate biases in fatality-based metrics and English-language media dependence.
The paper provides the first systematic investigation linking conflict dataset scope conditions, sourcing, and coding choices to divergent portrayals of political violence. It shows that automated datasets are invalid for conflict measurement due to fundamental errors and lack of oversight, while researcher-led datasets differ substantially based on whether they privilege internal reliability or external validity. The authors call for standards and transparency: datasets should explicitly state their priorities, definitions, and sourcing strategies; regularly publish source ecosystems; and undertake error identification and documentation. They recommend validity-focused practices—broad, adaptive conflict definitions; multilingual, local, and diverse source integration; partnerships with local observatories—to ensure more comprehensive coverage of evolving conflict forms and civilian harm. They caution against fatality thresholds as inclusion or intensity measures and encourage users to consider bias and coverage limitations. Future research should develop and share methods for error assessment suitable to dynamic, heterogeneous source environments and further evaluate impacts of sourcing strategies on cross-context comparability.
- The authors are creators of one dataset (ACLED), introducing potential bias; they disclose this and ACLED’s non-profit status.
- The review focuses on four public, global event datasets and excludes private/closed or region-specific datasets; demonstration/strategic development events are excluded from comparisons.
- Conclusions rely on documented methodological features, illustrative case studies, and examples rather than comprehensive quantitative matching across datasets; precise error rates for automated datasets are difficult to estimate given lack of oversight and transparency.
- Dynamic sourcing models, while improving validity, can complicate strict time-series comparability; all event data remain constrained by the underlying information environment and reporting biases, especially around fatalities.
- No single source type is unbiased; even extensive, multilingual sourcing cannot eliminate all errors or omissions.
Related Publications
Explore these studies to deepen your understanding of the subject.

