logo
ResearchBunny Logo
A fair individualized polysocial risk score for identifying increased social risk in type 2 diabetes

Medicine and Health

A fair individualized polysocial risk score for identifying increased social risk in type 2 diabetes

Y. Huang, J. Guo, et al.

Discover how researchers from the University of Florida developed an innovative machine learning pipeline to create an individualized polysocial risk score for type 2 diabetes patients. This groundbreaking study addresses the challenges faced by racial and ethnic minorities, showcasing an effective tool for predicting hospitalization risks with a focus on social determinants of health.

00:00
00:00
Playback language: English
Introduction
Type 2 diabetes (T2D) affects a global population exceeding 529 million, projected to more than double by 2050. Social determinants of health (SDOH), encompassing factors like education, income, and access to healthy food, significantly impact T2D development and prognosis. Disparities are evident, with racial and ethnic minorities bearing a disproportionate burden of T2D and its complications. Effective social needs management is crucial to address these inequities and improve outcomes. Current SDOH screening in healthcare settings is unfortunately limited due to several factors. Existing tools lack automation, are not tailored to predict specific conditions like T2D, and often focus on individual SDOH items without considering their complex interplay. The need for a comprehensive, automated, and fair screening tool is therefore apparent. The increasing availability of real-world data (RWD), such as electronic health records (EHRs), and advancements in artificial intelligence (AI), particularly machine learning (ML), offer opportunities to develop such tools. However, challenges remain, including data bias and the "black box" nature of some ML models. This study addresses these challenges by developing an EHR-based ML pipeline, the iPsRS, to predict hospitalization risk in T2D patients based on both individual-level and contextual-level SDOH. The model incorporates explainable AI (XAI) techniques and algorithmic fairness optimization to ensure equitable predictions across racial and ethnic groups. The long-term goal is to integrate social risk management into clinical care through an EHR-based platform, facilitating a paradigm shift in healthcare delivery.
Literature Review
Existing research highlights the significant impact of SDOH on T2D outcomes and the disproportionate burden on minority groups. While the US healthcare system is increasingly recognizing the need to address social needs, SDOH screening remains low due to challenges with current tools. These tools are often not automated, lack validation for specific conditions like T2D, and fail to consider the complex interactions among SDOH factors. Previous studies have explored polysocial risk scores (PsRS), but often lack the inclusion of both contextual and individual-level SDOH, limiting their generalizability. This research addresses these limitations by incorporating a broader range of SDOH factors and utilizing advanced analytical techniques to create a more robust and generalizable model.
Methodology
This retrospective cohort study utilized EHR data (2015-2021) from the University of Florida Health Integrated Data Repository, encompassing over 10,000 T2D patients. The study population included patients aged 18 and older with a T2D diagnosis, at least one encounter during the baseline and follow-up periods. The outcome measure was all-cause hospitalization within one year of the index date (first T2D diagnosis). Covariates included demographics (age, sex, race/ethnicity), clinical characteristics (comorbidities, medications, lab values), and both individual-level and contextual-level SDOH. Individual-level SDOH were extracted from clinical notes using a natural language processing (NLP) pipeline, covering education, employment, financial constraints, housing stability, food security, marital status, smoking, alcohol use, and drug abuse. Contextual-level SDOH were obtained through spatiotemporal linkage with external exposome data, including measures of food access, walkability, neighborhood disadvantage, and crime rates. The iPsRS was developed using machine learning models, including XGBoost and ridge regression, trained on the data. Model performance was evaluated using AUROC, F1-score, precision, recall, and specificity. Explainable AI (XAI) techniques, such as SHAP values, and causal structure learning (MGM-PC-Stable) were employed to identify important SDOH factors and their causal relationships. Algorithmic fairness was assessed using seven metrics (predictive parity, balancing FPR, equalized odds, conditional use accuracy equality, treatment equality, balancing FNR, overall accuracy equality), focusing on balancing false negative rates (FNR) across racial/ethnic groups. Bias mitigation techniques (DIR, ADB, CEP) were applied to optimize fairness.
Key Findings
The final analysis included 10,192 T2D patients. The iPsRS models (XGBoost and ridge regression) incorporating both individual-level and contextual-level SDOH showed superior performance compared to models using only one type of SDOH (AUROC 0.72 vs. 0.70-0.71). In the independent testing set, the one-year hospitalization rate in the top 10% iPsRS decile was 27.1%, approximately 21 times higher than the bottom decile. After adjusting for demographics and clinical characteristics, iPsRS explained 37.7% of the 1-year hospitalization risk, with a 22% increase in risk per decile increase (adjusted OR = 1.24, 95% CI 1.17-1.32). SHAP values and causal structure learning consistently identified housing stability as a key predictive factor. The ridge regression model initially showed bias against NHB and Hispanic groups (higher FNR), but after applying the DIR bias mitigation technique, the iPsRS achieved a good balance between prediction utility (AUROC 0.71) and fairness (FNR ratio decreased from 1.44 to 1.07 for NHB vs. NHW).
Discussion
This study successfully developed a fair and explainable ML pipeline, iPsRS, for identifying social risk factors associated with hospitalization in T2D patients. The iPsRS accurately and equitably identifies individuals at high risk, providing insights into modifiable factors, like housing instability, for targeted interventions. The model's ability to explain 37.7% of the hospitalization risk beyond clinical factors highlights the significant contribution of unmet social needs. The fairness optimization ensures equitable prediction across different racial and ethnic groups, mitigating potential biases. This approach addresses critical barriers in current SDOH screening and management, paving the way for integrating social risk management into routine clinical care.
Conclusion
The iPsRS offers a promising tool for efficient and effective social risk screening in T2D patients. Its ability to combine individual and contextual SDOH, coupled with fairness optimization and explainability, makes it suitable for integration into EHR systems. Future research should focus on broader geographical generalizability through federated learning and expanding the range of SDOH factors captured through improved NLP techniques. This work represents a significant step towards integrating social risk management into clinical practice, improving T2D outcomes and health equity.
Limitations
The study's findings may not be fully generalizable beyond the Florida population. The inclusion of SDOH variables was limited by the capabilities of the NLP pipeline, potentially excluding important factors. Incomplete or biased SDOH information in EHR notes might also affect the model's performance. Further research with larger, more diverse datasets across different geographical regions is needed to enhance generalizability.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny