logo
ResearchBunny Logo
Introduction
The COVID-19 pandemic necessitated widespread social distancing measures. However, adherence to these measures varied significantly across US counties. Previous research has established links between socioeconomic factors, political affiliations, and social distancing compliance, but often lacked comprehensive demographic data or precise measures of human interaction. This study aims to address these gaps by developing a more accurate prediction model incorporating a broader range of demographic and socioeconomic variables to predict county-level social distancing adherence. The model's improved accuracy can help policymakers allocate resources more effectively and guide targeted interventions to enhance compliance in areas with lower adherence. Understanding the drivers of SoDA is crucial for mitigating the pandemic's impact and managing future outbreaks.
Literature Review
Existing studies have explored the relationship between various factors and social distancing adherence. Some studies found a correlation between political leaning and SoDA, with Republican-leaning counties exhibiting lower adherence compared to Democratic counties. Others showed a link between lower per capita income and a high proportion of racial minorities with lower SoDA. These studies typically utilized mobile phone data to estimate SoDA but often lacked the granularity of human encounter data or a comprehensive set of socioeconomic variables. While some research incorporated social network models or probabilistic event-based models to assess SoDA and infection risk, they did not consider socioeconomic factors in predicting SoDA. This study aims to build upon this existing research by creating a more comprehensive model that incorporates a wider range of predictor variables to improve the accuracy of SoDA prediction.
Methodology
The researchers developed a multivariable bagging regression algorithm to predict SoDA scores for US counties. The model utilized 45 predictor features, including data from Unacast (mobile phone data for SoDA estimation), the CDC (obesity rates, diabetes rates, COVID-19 cases and deaths), the MIT Election Data and Science Lab (2016 presidential election voting data), CNN news reports (days since state Stay-At-Home orders), and the American Community Survey (ACS, 5-year averages from 2014–2018) for various socioeconomic factors. Unacast data was processed to generate a SoDA score based on three metrics: percent difference in average distance traveled, percent difference in visitation to non-essential places, and rate of human encounters per square kilometer. Univariable linear regressions were performed to assess the independent correlation between each feature and SoDA, yielding beta coefficients. A base model used all 45 features, a COVID-19-related features model used only pandemic-related data, and a top 25 features model utilized the most substantially correlated features from the univariable analysis. Model accuracy was evaluated using mean squared error and the coefficient of determination. The analysis was conducted using Python and the scikit-learn library.
Key Findings
The base model predicted county SoDA with 91.6% accuracy, demonstrating a high predictive power. The model revealed several significant correlations: Owner-occupied housing unit rate was the strongest negative correlate (β = -0.322, *P* < 0.00001), indicating lower adherence in owner-occupied housing areas, potentially reflecting suburban living patterns and greater reliance on private vehicles. Conversely, the percentage of persons working from home prior to the pandemic had the strongest positive correlation (β = 0.259, *P* < 0.00001). Other key findings included: positive associations with higher per capita income, older populations, and suburban areas; negative associations with higher African American population, high obesity rates, earlier first COVID-19 case/death, and more Republican-leaning residents. The model using only COVID-19-related features achieved 64% accuracy, significantly lower than the base model, highlighting the importance of socioeconomic and demographic features in predicting SoDA. The top 25 features model performed almost as well as the base model (89% accuracy), suggesting a subset of features is sufficient for high-accuracy prediction. Importantly, cumulative COVID-19 deaths and cases showed negligible predictive power.
Discussion
The findings emphasize the influence of socioeconomic factors on social distancing adherence. Suburban lifestyles, characterized by car dependency, may hinder social distancing practices. Economic disparities contribute to differential adherence, with those in poverty facing greater challenges in complying with guidelines. Health-related factors showed mixed correlations: while older populations (at higher risk) demonstrated greater adherence, counties with high obesity and diabetes rates had lower adherence, potentially reflecting increased vulnerability and a need for tailored interventions. The strong negative correlation with the African American population highlights existing health disparities and necessitates focused community-based interventions. Political affiliation showed a clear negative correlation with Republican-leaning counties showing lower adherence, possibly related to differing perspectives on pandemic mitigation strategies. The model's high predictive ability, particularly when incorporating socioeconomic factors, suggests that tailored interventions targeting these specific areas could significantly improve SoDA.
Conclusion
This study successfully developed a highly accurate model for predicting county-level social distancing adherence, highlighting the importance of economic, health, and political factors. The model provides a valuable tool for health policy planning and resource allocation, enabling more targeted interventions to improve compliance and mitigate the spread of future pandemics. Future research should investigate the daily fluctuations in SoDA to refine the model's predictive capability and explore the underlying causes of the observed correlations in greater depth.
Limitations
The model's accuracy could be impacted by several factors. The data doesn't account for mask use or other personal protective measures, which may influence transmission rates and adherence. The reliance on mobile phone data might exclude non-phone users, skewing the results. Aggregating SoDA scores over the entire quarantine period might obscure daily variations in adherence. Lastly, the use of 2018 Census data might not perfectly reflect current demographics.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny