
Medicine and Health
Predicting state level suicide fatalities in the united states with realtime data and machine learning
D. Patel, S. A. Sumner, et al.
Discover a groundbreaking deep learning approach to predict weekly suicide counts at the state level in the US, utilizing social media, online, and health data. This innovative study conducted by Devashru Patel, Steven A. Sumner, and others reveals the model's superior accuracy in estimating state-specific suicide rates, offering valuable insights for suicide prevention efforts.
~3 min • Beginner • English
Introduction
The United States experiences over 47,000 suicides annually, with rates increasing over the past two decades and varying substantially by geography. Western states and rural or small/medium metropolitan areas often have higher rates, and temporal changes differ across locations. Public health prevention requires timely, local data, yet official suicide death reporting lags by months to over a year, with timeliness varying by jurisdictional infrastructure. Experts advocate leveraging complementary near real-time data sources—online behavior, social media, economic and clinical data—to understand population-level suicide trends. Prior work has primarily focused on individual-level prediction from social media, with fewer studies addressing population surveillance. National-level models (e.g., Choi et al.) have shown promise combining heterogeneous real-time data to estimate US weekly suicide fatalities, but sub-national, state-level estimation remains underdeveloped. The study’s research question is whether diverse near real-time data sources can be combined via machine learning to nowcast weekly suicide fatalities at the state level, providing accurate and timely estimates for public health planning across geographically diverse states.
Literature Review
Prior studies indicate geographic disparities in suicide rates and emphasize the need for timely surveillance. Social media and search trends can reflect suicide risk factors; for example, Twitter content has been associated with geography-specific suicide rates, but most research targets individual-level outcomes (e.g., ideation detection) rather than population surveillance. Won et al. combined social media with economic and meteorological data to predict national suicides in South Korea. In the US, Choi et al. developed a national-level ensemble model combining heterogeneous real-time data to estimate weekly suicides with high accuracy. However, the literature lacks robust frameworks for integrating multiple, diverse real-time data sources to nowcast/forecast suicide at sub-national (state) levels. Concerns around representativeness and biases in convenience samples (e.g., social media usage variability by geography and demographics, limitations of Google Trends validity) and variability in access to psychiatric services across regions further motivate multi-source integration to mitigate individual source biases.
Methodology
Study setting and ethics: Secondary administrative and online data were used; the Georgia Institute of Technology IRB deemed this non-human subjects research. Four states with diverse populations and available data were included: Colorado (CO), Louisiana (LA), New York (NY), and Utah (UT).
Data sources (weekly, 2015–2018 unless noted):
- Online data (state-level):
- Google Search Trends: normalized popularity (0–100) for 42 suicide-related terms.
- YouTube Search Trends: normalized popularity (0–100) for the same 42 terms.
- Twitter: counts of public posts containing 38 suicide-related keywords/phrases/hashtags. Tweets were geo-assigned to states using users’ self-reported profile location strings, geocoded via HERE and OpenStreetMap APIs; only users whose inferred US state was one of the four target states were retained; weekly counts were aggregated by state.
- Health services data (near real-time):
- NSSP Emergency Department (ED) visits: weekly counts of visits for suicide ideation or attempt (Suicide-Related Syndrome) from participating facilities in each state.
- Mental Health America (MHA) PHQ-9: weekly averages from public online assessments, assigned to states via IP-based location.
- Suicide fatalities (outcome and predictor): Weekly state-level suicide deaths from CDC’s National Vital Statistics System (ICD-10 U03, X60–X84, Y87.0). When used as a predictor, only lagged historical data older than one year was used to reflect real-world availability.
Modeling approach:
- Primary model: Long short-term memory (LSTM) recurrent neural networks per state. Inputs included individual data sources and combinations (online-only, health services-only, health+online, and all sources; historical fatalities included in some baselines). For real-time data sources, models used a sliding window with the prior two weeks plus the current week to estimate fatalities for the current week. Sequences were rolled forward to generate weekly predictions.
- Train/validation/test: 2016 for training, 2017 for validation, 2018 for testing. Historical suicide fatalities served as the gold-standard for evaluation.
- Hyperparameter tuning: Limited grid search per state over number of LSTM layers {1,2}, hidden dimensions {16,32,64}, epochs {150,200,250,300}. Defaults: dropout 0.2 between LSTM layers; activations: Sigmoid (gates), Tanh (cells); learning rate 0.001; optimizer Adam; Xavier/Glorot initialization; L2 weight decay 0.01. Best models minimized RMSE on validation data.
- Evaluation metrics: Root Mean Squared Error (RMSE), Pearson correlation between weekly predicted vs. actual deaths, Mean Absolute Difference (MAD; median absolute weekly difference), and annual error rate (%) comparing estimated vs. actual crude suicide rate per 100,000.
Comparators:
- Baseline LSTM using only lagged historical suicide fatalities (simulating autoregressive approaches) per state.
- Two-phase ensemble (adapted from Choi et al.): fit optimal model per data stream, then combine stream-level predictions via an ANN into a single estimate, tuned for RMSE at the state level.
- PCA-based feature fusion: concatenate all data sources’ time series, apply PCA for dimensionality reduction, then train regressors (elastic net, LASSO, linear, random forest, ridge, support vector regression); best model per state selected by RMSE.
Key Findings
Model performance (test year 2018):
- All-sources LSTM models yielded annual error rates within ~5% in all four states and generally outperformed models using historical fatalities alone.
- Utah (ASR 21.04): ESR 20.458; error −2.768%; RMSE 3.765; MAD 3; Pearson r 0.065.
- Louisiana (ASR 15.45): ESR 15.014; error −2.823%; RMSE 4.156; MAD 7; r 0.061.
- New York (ASR 8.82): ESR 8.516; error −3.449%; RMSE 7.414; MAD 11; r 0.475.
- Colorado (ASR 22.51): ESR 21.312; error −5.323%; RMSE 5.889; MAD 11; r 0.223.
- Across states, LSTM models combining all data sources outperformed models trained only on lagged historical fatalities; errors tended to be negative, indicating underestimation of actual counts.
- State-wise optimal hyperparameters for best LSTM: layers=1 for all; hidden dims: UT 32, LA 16, NY 64, CO 16; epochs: UT 200, LA 200, NY 250, CO 150.
Individual sources (New York, 2018):
- Online sources showed lower annual error and higher week-to-week correlation than health services sources:
- Google: ESR 8.422 (−4.511%); RMSE 7.849; MAD 12; r 0.382.
- YouTube: ESR 8.455 (−4.137%); RMSE 7.929; MAD 13; r 0.244.
- Twitter: ESR 8.644 (−1.99%); RMSE 7.916; MAD 13; r 0.286.
- MHA (PHQ-9): ESR 8.348 (−5.352%); RMSE 8.730; MAD 20; r 0.037.
- ED visits: ESR 8.243 (−6.54%); RMSE 8.288; MAD 16; r −0.240.
- Historical fatalities baseline: ESR 8.512 (−3.495%); RMSE 8.092; MAD 14; r 0.016.
Sensitivity analyses (alternate combination methods; annual error on ESR, Pearson r):
- Two-phase ensemble (adapted from Choi et al.): CO ESR 22.15 (−1.60%), r 0.271; LA 18.32 (+18.58%), r 0.234; NY 10.57 (+19.85%), r 0.299; UT 17.61 (−16.30%), r 0.315.
- PCA-based fusion and regression: CO 20.92 (−7.05%), r 0.043; LA 14.04 (−9.10%), r 0.180; NY 8.40 (−4.76%), r 0.277; UT 20.46 (−2.78%), r −0.233.
Overall, alternate approaches were more heterogeneous and generally exhibited higher error rates than the LSTM all-sources models.
Discussion
The study demonstrates that integrating heterogeneous near real-time data—online search behavior, social media signals, and health services utilization—via LSTM models can produce accurate weekly estimates of state-level suicide fatalities, addressing the surveillance lag in official mortality reporting. The all-sources models consistently outperformed lagged historical-only baselines, underscoring the value of combining complementary data streams to capture contemporaneous trends. Variation in performance across states likely reflects differences in population size, urbanization, internet and social media penetration, and access to health services. A consistent underestimation bias was observed, potentially due to weekly sparsity and rapidly rising national suicide rates during the period, which may be difficult for models to fully capture. Findings suggest that real-time proxy data can provide timely situational awareness for public health planning, and that including historical fatalities as a predictor may sometimes over-index to past trends compared to models emphasizing current proxy signals.
Conclusion
This work introduces and validates a state-level, weekly nowcasting framework for suicide fatalities using an LSTM model that fuses diverse real-time data sources. Across four demographically and geographically diverse states, the models achieved approximately ≤5% annual error and outperformed baselines relying solely on historical deaths. The approach offers a pathway for more timely, localized suicide surveillance to inform prevention programs and rapid responses, particularly during societal disruptions. Future research should expand to more states, incorporate additional real-time and contextual data (e.g., environmental, economic), investigate reasons for state-level performance heterogeneity, refine methods to mitigate underestimation and sparsity challenges, and systematically evaluate whether excluding historical fatalities as predictors improves robustness.
Limitations
- Generalizability is limited by inclusion of only four states; performance may differ in smaller-population states and weeks with very low counts, where sparsity is greater.
- Potential biases in online data (e.g., social media representativeness, geographic/demographic variability in internet use) and in geolocation of Twitter data (public accounts only, profile-location-based inference) can affect accuracy.
- Health services data (ED visits, PHQ-9) may be influenced by access to and utilization of services, which varies geographically.
- The models tended to underestimate suicides; rapid secular increases in suicide rates may not be fully captured by inputs and windows used.
- Inclusion of historical fatalities as predictors may overfit to past trends; optimal use of historical vs. real-time proxies warrants further study.
- Additional relevant data sources (including environmental factors) were not included; some are not real-time.
- Gold-standard mortality classification may be imperfect due to misclassification (e.g., opioid-related fatalities) and varying timeliness of death certification.
Related Publications
Explore these studies to deepen your understanding of the subject.