logo
ResearchBunny Logo
Crime and its fear in social media

Sociology

Crime and its fear in social media

R. P. Curiel, S. Cresci, et al.

This research explores the fascinating link between social media posts and crime in 18 Spanish-speaking Latin American countries. Delve into the intriguing findings by Rafael Prieto Curiel, Stefano Cresci, Cristina Ioana Muntean, and Steven Richard Bishop, which reveal that tweets reflect regional fear of crime, despite a low percentage being crime-related.... show more
Introduction

The study investigates whether social media, specifically Twitter, provides an accurate representation of criminal reality or primarily reflects fear of crime. Motivated by the democratization of content production on social platforms and their growing role in news consumption, the authors question if social media reduces the biases seen in traditional media, which overemphasizes rare and violent crimes. Crime is chosen as the focus because it is routinely measured and widely discussed in both mass and social media. The research aims to quantify how crime and fear of crime are expressed on Twitter across Latin American countries, identify who posts crime-related content, and assess correlations between social media expressions, actual crime (murders), and survey-based fear of crime. The importance lies in evaluating social media’s utility for monitoring crime trends and as a potential timely proxy for fear of crime, as traditional surveys are costly and delayed.

Literature Review

Prior research shows traditional media substantially misrepresents crime, heavily overreporting violent and sexual offenses relative to their actual incidence (e.g., only a tiny fraction of crimes make the news; violence dominates coverage). Differences exist between traditional and social media event coverage, with social media sometimes offering immediacy and broader participation. Social media has been used to study activism, emergencies, disease spread, mobility, politics, and sentiment (e.g., hedonometer), raising the possibility it could reflect crime trends or fear of crime. However, fear of crime often diverges from actual victimization, and measuring it typically relies on lagged and costly surveys. Related work on crime and Twitter includes using keyword detection to classify posts as crime-related, leveraging social media for crisis mapping, and studies on public reactions to homicides and the role of networked actors, bots, and misinformation. The paper situates itself within this literature by systematically comparing social media signals to official crime statistics and fear-of-crime survey data at national and city levels.

Methodology

Data collection: Using Twitter’s Streaming API, all geolocated, non-retweeted tweets from within 18 Spanish-speaking Latin American countries were collected over 70 days (May 22–July 30, 2017), yielding 32,513,684 distinct tweets (27% Mexico, 23% Argentina, 12% Colombia, remainder other countries). City-level location was recorded when available, resulting in 2,678,783 tweets (8.2%) with city-level geographic resolution across 64 larger cities.

Crime-related keyword list and classification: An incremental lexicon of 392 crime-related words/hashtags (274 Spanish, the rest English) was compiled from tweets and crime news (publicly available). Words were assigned to overlapping categories: violence-related, property-crime-related, organised-crime-related, sexual-crime-related, murder-related, and gun-related. A tweet containing any lexicon term was flagged as crime-related; category membership was assigned if terms from that category appeared. Categories can overlap (e.g., murder ⊂ violence).

Validation of classification: To assess false positives (tweets flagged as crime-related but not actually about crime), 3,000 flagged tweets were manually annotated (including following links and inspecting media). About 66% were truly related to crime, fear of crime, or security/justice demands. With 95% confidence, the estimated precision lies between 64–68% (±1.8%). Analyses assume approximately two-thirds of automatically identified crime-related tweets are genuine, uniformly across categories and geographies.

External data sources: National-level crime was proxied by intentional homicides (murders) and murder rates (per 100,000), primarily from UNODC (circa 2015). Fear of crime was measured using LAPOP (2017), including a homicide-related fear question aggregated into a fear index. City-level comparisons focus on Mexico’s 23 metropolitan areas (≥750k inhabitants) using the 2016 ENVIPE victimization survey (crime rates by type and fear metrics). Cross-national city comparisons were avoided due to definitional and reporting differences.

Analytical approach: The proportion of crime-related and category-specific tweets per 1,000 tweets was computed at national and city levels. Concentration of posting activity was quantified via the Gini coefficient for all tweets, crime-related, and violence-related subsets, and by shares from top posters. Account typologies for crime-related tweets were estimated by manually labeling a sample of 100 tweets by source (media/journalists, involved users, government, regular users). Linear models assessed associations between crime-related tweet rates (overall and by category) and national murders, murder rate, and fear-of-crime metrics. At city level (Mexico), crime-related tweet rates were compared to ENVIPE indicators (e.g., murders, hard crimes per 100,000, robbery of a person, total crimes, local fear indices). Outliers and biases (e.g., a media consortium in La Laguna that geotagged most crime tweets) were identified; La Laguna was excluded from statistical analyses due to outsized influence. Temporal misalignment between social media (2017) and crime/fear data (2015–2016) was acknowledged; stability of national trends was used to justify comparisons.

Assumptions and constraints: Only a minority of tweets are geolocated; city-level analyses face severe data sparsity and potential source biases (e.g., local newspapers). Keyword methods trade recall and precision; more sophisticated NLP was not deployed to preserve real-time applicability.

Key Findings
  • Prevalence: Of 32.5M tweets, 501,057 were classified as crime-related, i.e., 15.41 per 1,000 tweets. On average during the period, 317.5 tweets/min were posted from the 18 countries, ~5 of which were crime-related.
  • Categories: Violence-related tweets were most frequent at 6.51/1,000. Other categories: murder-related ~4.0/1,000, property-crime-related 1.7/1,000, organized-crime-related 1.4/1,000, robbery-related 0.8/1,000, gun-related 0.7/1,000, sexual-crime-related 0.4/1,000.
  • Country variation: Large heterogeneity across countries (e.g., Venezuela ~38.1 crime-related tweets per 1,000; Nicaragua, Panama, Bolivia, Costa Rica <10/1,000).
  • Bias similar to traditional media: 28.3% of crime-related tweets were about murder, despite murders constituting only about 0.072% of crimes in Mexico, mirroring newspaper overrepresentation of homicides.
  • Who posts: 90% of active users posted no crime-related content. The top 1% of users contributed 61% of crime-related and 62% of violence-related tweets, versus 35% of all tweets. Concentration measured by Gini rose from 0.838 (all tweets) to 0.965 (crime-related). By source type: ~33% of crime-related tweets came from media/journalists, ~22% from “involved users,” ~7% from government, leaving ~38% from regular users.
  • National-level associations: Countries with more murders and higher murder rates had higher rates of crime-related, violence-related, and murder-related tweets per 1,000. Roughly, per additional murder there were ~8.4 murder-related, ~13.7 violence-related, and ~32.4 crime-related tweets. Crime-related tweeting was also associated with higher fear-of-crime measures; Venezuela had the highest fear and the highest crime-related/violence-related tweet rates.
  • Mexico comparisons (bias quantification): Per 100 crimes, there were 1.44 crime-related tweets overall; per 100 property crimes, 0.21 property-crime tweets; per 100 sexual crimes, 1.41 sexual-crime tweets; per 100 murders, 567.5 murder-related tweets. A sexual crime was tweeted 6.6 times more than a property crime; murders were tweeted 401 times more than sexual crimes.
  • City-level scarcity and weak correlations: Only ~2.68M tweets (≈8.3%) were geolocated to cities; among these, 19,912 were crime-related (7.4/1,000 at city level), less than half the national-level rate. Examples: Mexico 10.68 → 5.98/1,000; Colombia 24.64 → 5.13/1,000; Venezuela 38.14 → 21.85/1,000 when restricting to geotagged tweets. In 31 cities, fewer than one crime-related tweet was posted per day over 70 days; some cities had only 2–5 crime-related tweets total. City-level correlations between crime-related tweeting and victimization or fear metrics were generally negligible; property-crime tweets were only loosely related to hard crime rates. A strong local bias from a media consortium (La Laguna) demonstrated potential distortions; it was excluded from analyses.
Discussion

The findings indicate that social media exhibits similar biases to traditional media by overemphasizing rare, violent, and sexual crimes, rather than reflecting the distribution of everyday crime. At the national level, crime-related tweeting correlates with objective crime measures (murders, murder rates) and with fear-of-crime indices, suggesting that social media activity is more reflective of public concern and attention—amplified by media and engaged users—than of crime incidence itself. The strong role of mass media accounts and “involved users” in driving crime-related content further underscores that social media mirrors news agendas and activism rather than grassroots reporting of routine crimes. Conversely, at the city level, the scarcity of geolocated crime-related tweets and source biases impede reliable inference: data are too sparse and concentrated to track urban crime patterns or fear with precision. Thus, social media can provide timely, low-cost national-level signals aligned with fear of crime and high-salience violent crime trends, but it is ill-suited for fine-grained, city-scale monitoring or forecasting of crime.

Conclusion

The paper provides a large-scale, cross-country assessment of how crime and fear of crime are expressed on Twitter in Latin America. Main contributions include: (i) quantifying the prevalence and categorical composition of crime-related tweets; (ii) demonstrating substantial overrepresentation of violent and sexual crimes akin to traditional media bias; (iii) revealing strong concentration of posting among a small subset of accounts, with major influence from mass media and engaged users; and (iv) establishing that national-level crime-related tweeting aligns more with fear of crime and high-salience violent crime than with overall crime incidence, while city-level analyses are hindered by data sparsity and bias. Future research should explore improved text understanding (e.g., advanced NLP beyond keyword lists), enhanced geoparsing to enrich location coverage of non-geotagged posts, longitudinal designs to study temporal dynamics of fear responses to salient events, and robustness to manipulation (bots/fake news), potentially yielding more reliable national-level proxies for fear of crime and insights into media–public attention dynamics.

Limitations
  • Temporal misalignment: Tweets (2017) were compared to national crime (circa 2015) and Mexico city-level survey data (2016). Although national trends are relatively stable, timing differences limit causal inference.
  • Geolocation sparsity and bias: Only ~8% of tweets had city-level geotags; city-level analyses suffered from extremely low volumes and potential source distortions (e.g., local media heavily geotagging content in La Laguna).
  • Keyword-based classification: Reliant on a fixed lexicon (392 terms), leading to false positives and negatives; manual validation estimated ~66% precision. Uniform precision across contexts is assumed.
  • Cross-national comparability: Differences in crime definitions and underreporting limit comparisons other than murders; city-level cross-country comparisons were avoided.
  • Source concentration and agenda-setting: Heavy influence from media, government, and involved users skews content toward high-salience events, not representative of everyday crime.
  • Potential manipulation: Although no evidence was found of orchestrated manipulation in crime discussions, inherent risks from bots and misinformation remain in social media analyses.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny