logo
ResearchBunny Logo
Introduction
Suicide is a major public health concern in the United States, consistently ranking among the top ten leading causes of death. Despite numerous prevention strategies, most interventions target large geographical areas, neglecting the nuanced trends at the county level. While state-level suicide trends and prediction models exist, they may not reflect regional variations. Previous county-level studies haven't focused on detailed prediction or the impact of contextual factors. This research addresses this gap by analyzing US county-level suicide data (2010-2019) to build a machine learning model that predicts suicide rates. The model incorporates 17 county-level characteristics grouped into demographics, socioeconomic factors, and health indicators. The study aims to identify the importance of these factors and develop a Suicide Vulnerability Index (SVI) for all US counties, allowing for more targeted resource allocation and prevention efforts.
Literature Review
The introduction cites several studies on suicide prevention strategies, state-level suicide trends, and contextual factors. It highlights the lack of detailed county-level suicide prediction models and the need for a granular understanding of suicide patterns to improve targeted interventions. The authors mention previous research focusing on state-level trends or county-level suicide rates without in-depth prediction or impact analysis of contextual factors. The literature review implicitly sets the stage for this study by pointing out the existing research gap.
Methodology
The study utilized publicly available data from sources like the CDC WONDER database (suicide data), US Census Bureau (demographic and socioeconomic data), US Bureau of Labor Statistics, and County Health Rankings & Roadmaps. The dataset included 3140 US counties and covered 17 variables categorized as demographics (population, sex distribution, ethnicity, median age), socioeconomic factors (median income, poverty rate, unemployment, education levels, single-parent households, social association rate, crime rate), and health indicators (opioid dispensing rate, uninsured rate, mental health days, excessive drinking). An XGBoost regressor was employed for suicide rate prediction, chosen for its compatibility with SHAP values for feature importance analysis. The dataset was split 80/20 for training and testing, with hyperparameter tuning using a grid search and 10-fold cross-validation. SHAP (SHapley Additive exPlanations) values were used to determine feature importance, providing a measure of each feature's contribution to the model's predictions. The top five most influential features were then used to create the Suicide Vulnerability Index (SVI), calculated as a weighted function of the percentile ranks of these features. The weights were derived from the average SHAP value magnitudes.
Key Findings
Analysis of the 2010-2019 data showed that nearly 25% of US counties experienced at least a 10% increase in suicides, with 12% showing at least a 50% increase. The XGBoost model achieved an R² of 0.98 in predicting county-level suicide rates. SHAP analysis revealed that total population, % White population, and median age positively correlated with suicide rates, while % African American population and % female population showed negative correlations. The SVI, generated using the top five features, identified counties with high suicide vulnerability, particularly in the western and eastern coastal regions and Florida. The SVI's validity was assessed by comparing it with county populations, suicide ideation rates, and county-level suicide rate distribution; these comparisons demonstrated a strong positive correlation between SVI scores and suicide rates.
Discussion
The study's findings highlight the importance of county-level analysis for suicide prevention. The high accuracy of the prediction model and the identification of key factors through SHAP analysis provide valuable insights for targeted interventions. The SVI offers a practical tool for identifying vulnerable counties, enabling more efficient resource allocation and tailored prevention programs. The correlation between SVI and population suggests that higher population density may be associated with increased suicide risk. The study's results emphasize the need for region-specific strategies, as state-level data may not accurately capture localized trends.
Conclusion
This research successfully developed a highly accurate machine learning model for predicting county-level suicide rates in the US and created a novel SVI for identifying at-risk counties. The identified key features and the SVI provide valuable tools for targeted suicide prevention efforts. Future research could explore additional contributing factors, refine the SVI, and evaluate the impact of interventions based on this index.
Limitations
Several limitations were acknowledged: potential misclassification of suicide deaths, underreporting of suicides, exclusion of unintentional self-harm deaths, potential intra-county variations, and the exclusion of certain factors (e.g., tribal populations) due to data limitations. The authors note that these limitations could affect the interpretation and generalizability of the results.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny