
Business
Extracting Useful Emergency Information from Social Media: A Method Integrating Machine Learning and Rule-Based Classification
H. Shen, Y. Ju, et al.
This study introduces an innovative machine learning and rule-based integration method (MRIM) for extracting valuable emergency information from social media content. Conducted by Hongzhou Shen, Yue Ju, and Zhijing Zhu, the research showcases the effectiveness of MRIM over traditional methods during the Zhengzhou rainstorm, providing insightful implications for emergency management.
~3 min • Beginner • English
Introduction
Social media platforms such as Twitter and Facebook have become essential channels for communication and significant sources of emergency information during crises. Users generate near-real-time content in text, images, and videos that can inform emergency diagnosis and response (e.g., damage severity, rescue needs, missing persons) and support resource allocation. Yet, despite their availability, social media posts vary in quality and authenticity due to differences in cognition, motivation, communication skills, and emotions of authors. Platforms cannot authenticate all publishers or content. Therefore, efficient approaches are needed to extract reliable and useful emergency information post-event to support decision making by emergency management agencies. This paper addresses this gap by proposing and testing an integrated machine learning and rule-based method to classify useful emergency information from social media and by examining which content and contextual characteristics affect its performance.
Literature Review
Utilization of Social Media in Emergencies: Social media is widely used by the public to disseminate and obtain emergency-related information, including locations, timings, sentiments, and requests for help. It has enabled coordination during infrastructure outages (e.g., Haiti 2010), revealed demographic differences in response behaviors, and served as a channel to receive official updates. Emergency management agencies (EMAs) use social media to issue announcements, interact with the public, monitor attitudes, collect first-hand situational information, and leverage smartphone geolocation. Building teams of trusted digital volunteers can further support emergency information management.
Extracting Information from Social Media through Machine Learning: Machine learning (ML), particularly supervised learning with SVMs, is effective for text classification tasks due to strong generalization and robustness with limited training data. Feature extraction via TF-IDF and n-grams improves performance. ML has been applied to detect situational information, damage assessment, situational awareness, and vaccine information dissemination. However, classification performance depends heavily on features and labeled data, and high-dimensional, noisy features can hinder results.
Rule-Based Classification Method: Pure ML methods often cannot incorporate expert knowledge and may be sensitive to training/test splits. Rule-based methods encode expert knowledge as IF-THEN rules and can consider communication and user characteristics (e.g., likes, comments, shares; user verification, followers) alongside text. Rule-based approaches have been used in flood content analysis and extent detection (often with expert data), but not widely for extracting useful emergency information directly from social media posts. Integrating ML with rule-based methods is expected to leverage both strengths.
Research Review: Prior work largely focuses on pure ML for extracting information, with limited integration of human intelligence. Few studies identify, extract, and analyze useful emergency information for improving relief efficiency. This study explores the feasibility of integrating ML and rule-based methods to extract useful emergency information post-event and offers suggestions for EMAs to incorporate social media data into formal emergency information management.
Methodology
Research Setting and Data: The study focuses on Sina Weibo due to its large user base (573 million MAUs as of Dec 2021), topic-based data access, and rich post/user metadata. The event studied is the July 20, 2021 heavy rainstorm in Zhengzhou, Henan, China (July 17–22, 2021), which caused 380 deaths/missing and CNY 40.9 billion in direct economic losses. The hashtag Henan rainstorm mutual assistance was launched on July 20 by official media; over 100,000 microblogs were posted, many lacking actionable details. The study collected original microblogs under this hashtag.
Data Collection and Preparation: Using a Python crawler, the authors collected posts from July 20 to August 14, 2021. After de-duplication and removing posts after August 3 (lower rescue relevance), 7,979 microblogs remained (July 20–Aug 3). For each post, they captured text, likes, comments, shares, and user attributes (number of blogs, followers, fans).
Labeling: Eight trained student annotators labeled all microblogs as useful, useless, or uncertain regarding direct support for emergency rescue. Each post was labeled independently by two annotators. Conflicts or any uncertain label were resolved via discussion with a third researcher; final labels were binary: 1 useful (n=1,936) and 0 useless (n=6,043).
Text and Feature Processing: Text was cleaned (remove HTML), segmented using Jieba, and stop-words removed. Word counts were computed. Sentiment values were calculated using dictionaries (HowNet, NTU Sentiment Dictionary, Chinese emotional polarity dictionary, Dalian University of Technology ontology), considering degree adverbs and negation; sentiment >0 positive, <0 negative, 0 neutral. Exact address extraction combined named entity recognition and address feature words; addresses containing 1–5 levels of address feature words were marked as exact. Contact information was extracted via regex. Binary indicators were set for presence of exact address and contact info. To avoid extreme value influence, maximum caps were set on parameters exceeding 90th percentiles, and scoring formulas were defined per non-zero averages.
Machine Learning Component: The ML classifier used a linear-kernel SVM, suitable for short text and smaller training sets. Tokenization used unigram TF-IDF features; terms appearing at least three times were used; tenfold cross-validation evaluated performance. TF-IDF weighting was applied as standard.
Rule-Based Component: A rule-based scoring system evaluated each microblog across six dimensions grouped into three aspects, with expert-determined weights via a Delphi process with three experts (two academic researchers, one EMA staff):
- Communication characteristics (Attention score; weight 20%): likes, comments, shares.
- User characteristics (User score; weight 10%): number of blogs, followers, fans.
- Content characteristics: sentiment value (5%), text word count (5%), exact address presence (20%), contact information presence (40%).
Each dimension was scored based on parameter values using predefined formulas and caps (e.g., likes/10 capped at 33.3, etc.). Experts recommended a usefulness boundary score between 35–45; testing found 40 optimal. Steps: extract parameters, score six dimensions, sum weighted score, classify as useful if score ≥40.
Integration Strategy (MRIM): Final classification integrates ML and rule-based outputs under a conservative OR rule to avoid missing useful information: a microblog is classified as useless only if both methods classify it as useless; otherwise it is useful. A score-weighted integration alternative was also tested for comparison.
Experiments: Comparative evaluations were conducted for SVM alone (Model 1), rule-based alone (Model 2), and integrated MRIM (Model 3). Additional comparisons included Naive Bayes (NB), Decision Tree (DT), and their integrations with SVM. Further analyses examined performance across microblog characteristics: word count bins (10 equal groups), presence of exact address/contact info (four groups), and attention bins (sum of likes, comments, shares; 10 equal groups).
Key Findings
Overall performance (tenfold CV on 7,979 labeled posts):
- SVM alone (Model 1): Precision 0.916, Recall 0.585, F-measure 0.714.
- Rule-based alone (Model 2): Precision 0.919, Recall 0.633, F-measure 0.750.
- MRIM (Model 3): Precision 0.894, Recall 0.777, F-measure 0.831 (best overall and recall).
Integration weighting test: A weighted score integration (Integrated_Score = (1−w)*SVM_score + w*Rule_score; threshold 40) achieved peak F-measure 0.823 at w=0.51, still below MRIM’s OR-integration F-measure 0.831.
Algorithm comparisons:
- NB (Model 4): P 0.934, R 0.506, F 0.656.
- DT (Model 5): P 0.817, R 0.626, F 0.709.
- NB+SVM (Model 6): P 0.914, R 0.535, F 0.675.
- DT+SVM (Model 7): P 0.809, R 0.724, F 0.764.
- Rule-based+SVM (MRIM, Model 3): P 0.894, R 0.777, F 0.831 (best among combinations).
Effect of microblog characteristics:
- Word count (10 groups): F-measure increased with more words. SVM/MRIM F rose from 0.222/0.250 (avg 3.16 words) to 0.867/0.898 (avg 97.73 words), peaking at 0.831/0.925 around 69 words.
- Exact address/contact info (4 groups): MRIM outperformed SVM overall. Both methods underperformed for posts with exact address only (about 40% had non-standard address writing), degrading keyword-dependent detection and rule scoring.
- Attention (10 groups): MRIM outperformed SVM in most groups (notably groups 3–6). Both methods performed worse in the highest-attention groups (9–10), where ~50% were comprehensive media reports that do not directly support rescue actions, leading to misclassification as useful.
Misclassification analysis:
- False positives (useless classified as useful): many were reposts or copies of professional media reports with comprehensive content and high attention, and SVM sensitivity to missing feature data in sparse posts.
- False negatives (useful classified as useless): informal/colloquial language hindered feature detection; irregular or incorrect address/contact formatting prevented accurate extraction; very short posts lacked detectable features. Integrating rule-based scores mitigated some errors.
Discussion
The study addresses the challenge of extracting actionable emergency information from noisy social media content by integrating algorithmic classification with expert-informed rules. MRIM’s superior recall and F-measure demonstrate that incorporating expert knowledge about communication, user, and content characteristics enhances identification of posts that can directly support rescue activities. The OR-integration strategy aligns with the operational priority to minimize missed actionable information, even at the cost of admitting some non-actionable content.
Findings clarify conditions affecting performance: posts with richer textual content and standardized address/contact details are more reliably detected; informal expressions and non-standard address formats degrade both ML and rule-based extraction. High-attention posts often reflect media summaries that, while informative, are less actionable for rescue, explaining reduced performance at the top attention tiers. These insights guide both methodological design (e.g., feature engineering for short/informal text, improved address parsers) and operational triage (prioritizing individual eyewitness posts).
Conclusion
This paper proposes MRIM, an integrated method combining SVM-based text classification and expert rule-based scoring to extract useful emergency information from social media. Tested on Weibo data from the July 20 Zhengzhou rainstorm, MRIM outperforms SVM alone and rule-based alone methods, with higher recall and F-measure. It reveals that word count, standardized exact address and contact information, and user attention influence classification performance. The study contributes a practical framework for EMAs to incorporate social media analysis into emergency information management and highlights the value of integrating expert knowledge with ML. Future research should validate MRIM across different emergency types and platforms with larger datasets, expand UGC features (temporal/spatial, multimedia, emoji, user behaviors), and develop specialized address dictionaries and more intelligent address extraction methods to further improve performance.
Limitations
1) The integrated rules were developed for a rainstorm scenario on Weibo and may not transfer directly to other emergency types without customization and validation. 2) The method is tailored to Weibo; adaptation is required for other platforms (e.g., Twitter) despite the generalizable integration concept. 3) The Delphi process involved only three experts; involving more experts could strengthen rule authority and robustness.
Related Publications
Explore these studies to deepen your understanding of the subject.