
Medicine and Health
A fair individualized polysocial risk score for identifying increased social risk in type 2 diabetes
Y. Huang, J. Guo, et al.
Discover how researchers from the University of Florida developed an innovative machine learning pipeline to create an individualized polysocial risk score for type 2 diabetes patients. This groundbreaking study addresses the challenges faced by racial and ethnic minorities, showcasing an effective tool for predicting hospitalization risks with a focus on social determinants of health.
~3 min • Beginner • English
Introduction
The study addresses how social determinants of health (SDOH) contribute to disparities in type 2 diabetes (T2D) outcomes and whether a polysocial risk score can identify patients at heightened social risk for hospitalization. Diabetes affects hundreds of millions globally, with T2D comprising over 90% of cases. SDOH such as education, income, and access to healthy food substantially influence T2D development and prognosis, and racial/ethnic minorities disproportionately bear T2D burdens. Despite recognition of SDOH, routine clinical screening remains infrequent and often limited to single items, is non-automated, and not tailored to specific outcomes like T2D. Existing polysocial risk score studies largely rely on individual-level SDOH in small cohorts and lack generalizability. Advances in real-world data (EHRs, claims) and machine learning create opportunities but pose challenges: limited integration of SDOH with clinical data, biases in observational data that can yield unfair predictions for disadvantaged groups, and limited adoption of black-box models. Explainability (e.g., SHAP) and causal structure learning (e.g., PC algorithm) can clarify factor contributions and interactions but are underutilized together. Therefore, the authors aim to develop an EHR-based, explainable, and fair ML pipeline (iPsRS) integrating individual- and contextual-level SDOH to predict 1-year hospitalization among T2D patients and identify modifiable social risk factors for intervention.
Literature Review
The paper notes low adoption of SDOH screening in U.S. clinical settings and limitations of current tools: manual workflows, universal rather than outcome-specific designs, and inadequate capture of complex, interacting SDOH. Prior calls to use a polysocial risk score (PsRS) exist, yet published PsRS efforts have focused on individual-level SDOH and small cohorts, limiting generalizability. ML applications in healthcare often overlook biases inherent in real-world data, potentially harming minority and socioeconomically disadvantaged groups. Explainable AI methods like SHAP, while common for feature attribution, inadequately capture joint effects among SDOH, motivating complementary use of causal structure learning (e.g., PC-based algorithms) to elucidate interdependencies and potential causal pathways.
Methodology
Design and data source: Retrospective cohort study using 2015–2021 EHR data from the UF Health Integrated Data Repository (UF Health IDR), approved as exempt by the University of Florida IRB (IRB202201196). UF Health serves >1 million patients annually across Gainesville, Jacksonville, and satellite clinics.
Population: Adults (≥18 years) with T2D identified by ≥1 inpatient or outpatient T2D diagnosis (ICD-9 250.x0/250.x2 or ICD-10 E11) and ≥1 glucose-lowering drug prescription, using a validated EHR case-finding algorithm (PPV >94%). Patients required ≥1 encounter in both the baseline period and the follow-up year. Index date: first recorded T2D diagnosis. Baseline period: 3 years prior to index for predictors. Follow-up: 1 year after index for outcome.
Outcome: First all-cause hospitalization within 1 year post-index (any inpatient encounter during follow-up).
Covariates: Demographics (age, sex, race/ethnicity: NHW, NHB, Hispanic, Other) and clinical information (comorbidities, co-medications, labs, clinical observations). Residential ZIP codes collected for contextual linkage.
Individual-level SDOH: Extracted from clinical notes via an in-house NLP pipeline (SODA), including education (college/above, high school/lower, unknown), employment (employed, unemployed, retired/disabled, unknown), financial constraints (has constraints, unknown), housing stability (homeless/shelter, stable housing, unknown), food security (has food insecurity, unknown), marital status (single; married/partner; widowed/divorced; unknown), smoking status (ever, never, unknown), alcohol use (yes, no, unknown), drug abuse (yes, no, unknown). Insurance (private, Medicare, Medicaid, no-pay, unknown, other) from structured data.
Contextual-level SDOH: 114 built and social environment variables (e.g., food access, walkability, vacant land, neighborhood disadvantage, social capital, crime and safety) from six validated sources. Spatiotemporal linkage used patient 9-digit ZIPs, with area-weighted averages within a 250-mile buffer around ZIP centroid and time-weighted averages across residential history in the baseline period.
Preprocessing: Missing values imputed using an "unknown" category for categorical variables and the mean for continuous variables. Created dummy variables for categorical features; applied min–max normalization to continuous variables for regularized models. Addressed outcome imbalance using random oversampling (ROS), random undersampling (RUS), and undersampling by matching on Charlson Comorbidity Index (CCI) to create balanced training sets.
Model development: Built models with three feature sets: (1) individual-level SDOH only, (2) contextual-level SDOH only, and (3) combined individual + contextual SDOH. Algorithms included linear models (logistic, lasso, ridge, ElasticNet) and tree-based Extreme Gradient Boosting (XGBoost). Data split: modeling dataset (2015–2020) and independent test set (2021). Within modeling data: 70% train, 10% validation, 20% internal test. Performed five-fold cross-validated grid search on the training set to optimize hyperparameters; early stopping on validation to avoid overfitting. Baseline models using demographics and clinical factors (e.g., CCI) were trained for comparison. Performance metrics: AUROC, F1 score, precision, recall, specificity. Each patient received an iPsRS hospitalization risk score; scores were grouped into 11 risk strata (top 1–5%, top 6–10%, then deciles) for calibration analyses.
Explainability and causal analysis: Employed SHapley Additive exPlanations (SHAP) to rank feature contributions. Used Mixed Graphical Models with PC-Stable (MGM-PC-Stable) to learn a directed acyclic graph (DAG) over top SDOH features and hospitalization to explore potential causal relations and interactions.
Fairness assessment and mitigation: Assessed fairness using seven metrics: predictive parity, predictive equality (FPR balance), equalized odds, conditional use accuracy equality, treatment equality, equality of opportunity (FNR balance), and overall accuracy equality. Primary focus: balancing FNR across racial/ethnic groups (NHB and Hispanic vs NHW), with parity ratios between 0.80–1.25 deemed statistically fair. Applied mitigation techniques: pre-processing Disparate Impact Remover (DIR), in-processing Adversarial Debiasing (ADB), and post-processing Calibrated Equalized Odds Postprocessing (CEP). Tooling: Python 3.7 with scikit-learn, imbalanced-learn, statsmodels; AI Fairness 360 for fairness mitigation; Tetrad for causal structure learning.
Key Findings
Cohort: 10,192 T2D patients; mean age 58±13 years; 58% women; race/ethnicity: 50% NHW, 39% NHB, 6% Hispanic, 5% Other. Insurance: 41% Medicare, 31% private, 15% Medicaid, 5.7% uninsured.
Model performance: Individual-level SDOH models achieved AUROC 0.70–0.71; contextual-only models were suboptimal (AUROC 0.60–0.62); combining individual + contextual SDOH modestly improved performance (AUROC up to 0.72). Without imbalance preprocessing, models showed very low F1, precision, and recall. Compared to baseline demographic/clinical models, iPsRS improved AUROC by ~10%.
Risk stratification and association: In the 2021 independent test set, the top 10% iPsRS group had a 1-year hospitalization rate of 27.1%, about 21 times higher than the bottom decile. In multivariable logistic regression adjusting for demographics and clinical factors, iPsRS explained 37.7% of the risk of 1-year hospitalization; per decile increase, hospitalization odds rose by 24% (adjusted OR 1.24; 95% CI 1.17–1.32).
Explainability and causal insights: SHAP identified housing stability as the most predictive feature, followed by insurance type and smoking status. Housing stability had high missingness (57.5%); smoking status missingness was low (5%). Causal DAG (MGM-PC-Stable) over 21 key SDOH plus outcome revealed direct links from insurance type, housing stability, and neighborhood aggravated assault rate to hospitalization. Aggravated assault rate appeared as a common cause of both housing stability and hospitalization, indicating contextual SDOH modulates the effect of individual-level SDOH.
Fairness: XGBoost exhibited fairer FNR balance than linear models. Ridge regression showed FNR ratios biased against minorities: NHB vs NHW 1.44 and Hispanic vs NHW 1.32. After applying DIR to the ridge model, AUROC remained comparable (0.71 vs 0.72 original) while NHB vs NHW FNR ratio improved to 1.07, within the predefined fairness range. CEP improved fairness most but substantially reduced AUROC (0.722 to 0.550), whereas ADB and DIR achieved better utility–fairness trade-offs. Table 2 parity values further showed better parity for XGBoost with full or individual-level SDOH compared to contextual-only models.
Discussion
The iPsRS pipeline effectively identifies T2D patients at elevated social risk for hospitalization by integrating individual and contextual SDOH into explainable, fair ML models. The models demonstrate strong discriminative ability, particularly when including individual-level SDOH, with contextual variables adding modest gains. Stratification by iPsRS highlights substantial risk gradients, supporting use for targeted social risk interventions. Explainable analyses consistently elevate housing instability, insurance type, and smoking as key contributors, while causal discovery suggests that contextual factors (e.g., aggravated assault rates) may act as upstream drivers influencing both housing stability and hospitalization risk. Fairness assessment revealed disparities in FNR for linear models that were substantially mitigated using pre-processing methods (DIR) with minimal loss in predictive utility, addressing concerns that disadvantaged groups could be under-identified and miss interventions. Collectively, findings substantiate that a fair, explainable, EHR-based polysocial risk score can support outcome-specific social risk screening, guide resource allocation, and inform intervention prioritization in T2D care.
Conclusion
This work introduces a fair and explainable EHR-based individualized polysocial risk score (iPsRS) to predict 1-year hospitalization among patients with T2D by integrating individual- and contextual-level SDOH. The approach improves discrimination over demographic/clinical baselines, reveals modifiable social factors such as housing instability, and, with fairness optimization, achieves equitable performance across racial/ethnic groups. The iPsRS is positioned for integration into EHR workflows to augment SDOH screening and guide tailored interventions. Future directions include expanding NLP extraction to additional SDOH (e.g., stress), leveraging AutoML to enhance performance and efficiency, and improving generalizability via federated learning and multi-region data. The authors plan to co-design an EHR-embedded individualized social risk management platform with stakeholders to translate the model into practice.
Limitations
Generalizability may be limited due to a single health system within Florida, though the cohort is diverse with mixed urban/rural representation; future work will broaden geography and use federated learning. Individual-level SDOH were limited to variables already supported by the NLP pipeline (SODA), omitting potentially important factors such as stress; ongoing NLP development aims to expand coverage. SDOH documentation in EHR notes may be incomplete or biased (e.g., high sensitivity, lower specificity), though a separate analysis suggested better completeness among disadvantaged populations. Model development followed standard ML practices with a constrained hyperparameter/model search space; AutoML is planned to enhance accuracy and robustness. Contextual-only models showed weaker discrimination, and certain key features (e.g., housing stability) had high missingness, which may affect performance and interpretation.
Related Publications
Explore these studies to deepen your understanding of the subject.