logo
ResearchBunny Logo
Machine learning based suicide prediction and development of suicide vulnerability index for US counties

Medicine and Health

Machine learning based suicide prediction and development of suicide vulnerability index for US counties

V. Kumar, K. K. Sznajder, et al.

Discover groundbreaking insights from Vishnu Kumar, Kristin K. Sznajder, and Soundar Kumar as they explore alarming suicide trends across the US. Their innovative machine learning model boasts an impressive R² of 0.98 and identifies critical socioeconomic factors, paving the way for an essential Suicide Vulnerability Index (SVI) aimed at enhancing targeted prevention efforts.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the rising public health problem of suicide in the United States, where over 47,500 deaths occurred in 2019 and rates increased by 33% between 1999 and 2019. While numerous prevention strategies exist, most operate at state or national levels and may not capture local variation. Counties are the smallest geographic units available in CDC WONDER, and prior county-level studies have not fully explored predictive modeling of suicide, the impact of contextual factors, or identification of high-risk regions at county level. The research aims to fill this gap by analyzing US county-level suicides from 2010–2019, developing a machine learning prediction model using demographic, socioeconomic, and health factors, interpreting feature importance via SHAP, and constructing a Suicide Vulnerability Index (SVI) to identify counties at high risk for targeted prevention and resource allocation.
Literature Review
Prior work has documented state-level suicide trends, contextual factors, and prediction models, including associations with social capital, personality factors, firearm ownership, and socioeconomic determinants. County-level research has examined associations such as poverty, mental health provider shortages, and urbanization, and described trends from 1999–2016/2015. However, existing county-level studies have not provided detailed predictive modeling of suicide rates incorporating multiple contextual factors with explainability, nor tools to identify counties most vulnerable to high suicide rates. This study addresses these gaps by building an explainable machine learning model at the county level and deriving an SVI from the most influential predictors.
Methodology
Data for 3140 US counties from 2010–2019 were compiled from publicly available sources. Suicide deaths were obtained from CDC WONDER Multiple Cause of Death (ICD-10 codes X60–X64 for intentional self-harm). Seventeen county-level features were selected based on literature and data availability and grouped into: (1) Demographics: Total Population, % Female, % White, % African American, % Other races, Median Age (from US Census County Population by Characteristics 2010–2019). (2) Socioeconomic Factors: Median Income, % Poverty (US Census SAIPE), % Unemployed (BLS LAU), % Some College, % Single-Parent Households, Social Association Rate, Violent Crime Rate (County Health Rankings & Roadmaps). (3) Health: Opioid Dispensing Rate (CDC), % Uninsured (US Census SAHIE), Mentally Unhealthy Days, % Excessive Drinking (County Health Rankings & Roadmaps). The task was formulated as a regression problem to predict county suicide rates using XGBoost regressor. Data were split 80:20 into training and testing sets; hyperparameters were tuned via grid search with 10-fold cross-validation. Model explainability used SHAP values (TreeSHAP integrated with XGBoost) to quantify each feature’s contribution and rank importance. For SVI development, the top five features by SHAP importance (Population, % African American, % White, Median Age, % Female) were transformed to percentile ranks per county. Features with positive impact on suicide rate (Population, % White, Median Age) were ranked high-to-low, while those with negative impact (% African American, % Female) were ranked low-to-high. The SVI was computed as a weighted sum of the percentile ranks, where weights were the average absolute SHAP values for each feature. SVI was scaled 0–1, with higher values indicating greater suicide vulnerability. Geographic visualizations were created with Plotly choropleth maps.
Key Findings
- Total suicides among US residents from 2010–2019: 365,286, with most occurring in 2017–2019. - Trend: ~25% of counties had at least a 10% increase in suicides from 2010 to 2019; ~12% had increases of at least 50%. - Geographic distribution: Higher county suicide rates were consistently located across the western US (parts of Alaska, Washington, Oregon, California, Nevada, Arizona, New Mexico, Colorado, Wyoming, Montana), while many West North Central states showed lower rates. - Model performance: XGBoost regressor using 17 features achieved R² = 0.98 on the test set. - Feature importance (SHAP): Top 5 features driving predictions were Population, % African American, % White, Median Age, and % Female. Population, % White, and Median Age had positive impacts on predicted suicide rate; % African American and % Female had negative impacts. - SVI results: High SVI counties were observed in parts of Washington, Oregon, California, Nevada, Arizona, Florida, North Carolina, New York, Pennsylvania, New Jersey, Washington DC, and Michigan; lower SVI in many West North Central states. - Validation examples: Top 10 populous counties all had high SVI (≥0.86–0.90), aligning with positive population–suicide correlation; counties with high suicide ideation (Los Angeles, Maricopa, Cook, San Diego, Orange) also had high SVI (≥0.87); SVI geographic patterns aligned with observed high suicide rate regions.
Discussion
The findings support that county-level modeling can reveal local nuances not captured at state or national levels. The highly accurate XGBoost model indicates that demographic composition and age structure, especially total population, racial composition, and median age, are key drivers of county suicide rates, while higher proportions of African American and female populations are associated with lower predicted rates. The SVI, derived from the most influential features, effectively flags counties prone to higher suicide rates and corresponds with independent indicators such as population size and suicide ideation reports. This index can inform targeted suicide prevention, resource allocation, and monitoring, particularly useful during crises like pandemics. Variability in SVI within states underscores the need for county-specific interventions rather than relying solely on broader trends.
Conclusion
This study contributes an explainable machine learning framework to predict county-level suicide rates and a novel Suicide Vulnerability Index based on the top SHAP-identified features. The approach accurately models suicide patterns (R² = 0.98), identifies key predictors, and produces an actionable index that highlights high-risk counties across the US for targeted prevention and policy planning. Future work could incorporate additional contextual variables (e.g., tribal populations, firearm ownership, mental health service availability at finer spatial scales), address potential underreporting biases, and extend the framework to sub-county geographies or temporal forecasting to enhance early warning capabilities.
Limitations
- Misclassification and underreporting: Some suicide deaths may not be classified or reported as suicides, leading to underestimation. - Exclusion of unintentional self-harm cases that may have been intentional. - County-level aggregation may mask within-county variation; sub-county differences are not captured. - Feature set limited by data availability; other influential factors may be missing (e.g., explicit accounting for tribal populations, other unmeasured contextual variables). - The degree of these limitations is unknown.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny