logo
ResearchBunny Logo
Machine learning methods for “wicked” problems: exploring the complex drivers of modern slavery

Social Work

Machine learning methods for “wicked” problems: exploring the complex drivers of modern slavery

R. Lavelle-hill, G. Smith, et al.

This paper unveils the intricate factors contributing to modern slavery through innovative machine learning techniques. Conducted by Rosa Lavelle-Hill, Gavin Smith, Anjali Mazumder, Todd Landman, and James Goulding, it reveals how a country's capacity to safeguard women's rights significantly influences their vulnerability to exploitation.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses how to identify and quantify national-level drivers of modern slavery in the face of sparse, noisy data and many potential predictors. Modern slavery persists globally and is included in SDG 8.7, yet measurement and causal understanding remain difficult due to hidden victimization and broad, umbrella definitions. Traditional regression approaches struggle in small n, large p contexts marked by multicollinearity and non-linear interactions. The authors aim to develop and apply an inductive, machine-learning methodology that can generalize under data scarcity, capture non-linearities and interactions, and provide robust, stable explanations of which factors most strongly predict national slavery prevalence. The work seeks to improve understanding of conditions that allow slavery to persist and to inform more effective policy interventions.
Literature Review
The paper reviews challenges in estimating slavery prevalence, focusing on Gallup World Poll–derived Global Slavery Index (GSI) estimates and critiques of the GSI’s extrapolation and vulnerability model (e.g., conflation of risk with prevalence, limited geographic representativeness, changing definitions, limited transparency, and unreported uncertainty). Prior research links higher slavery incidence with low GDP, corruption, poverty, conflict, orphanhood, health crises (e.g., HIV/AIDS), and environmental stressors; however, much of this evidence is qualitative or based on small samples. Quantitative efforts have often employed linear, theory-driven models with few variables or the GSI’s five-dimension vulnerability framework derived via factor analysis, which drops correlated variables and shows only moderate correlation with prevalence (r ≈ 0.33). The authors argue that non-linear, multivariate, inductive methods with cross-validation and model stability assessment are needed to uncover novel predictors, handle collinearities, and better quantify variable importance.
Methodology
Data and outcome: The dependent variable is national slavery prevalence (percent of population) derived directly from Gallup World Poll surveys as reported in the GSI for 2016 and 2018, covering 48 unique countries and 70 country-year observations (22 countries with both years). The sample excludes Western Europe and North America, limiting generalizability. Predictors: 106 national-level indicators were compiled from open sources (World Bank, UNAIDS, WomanStats, Early Warning Project, CIRI, SDGs, Landman and Silverman) spanning economics, governance, rights, conflict, health, demographics, and infrastructure. A theory-based subset of 34–35 variables was also created via literature review and expert input. Preprocessing: For variables with multi-year availability, the most recent data were selected within two broad time windows aligned to predicting 2016 (≤2016) and 2018 (>2016 to 2019). Variables with >50% missingness were dropped. Remaining missing values were imputed in two steps: (i) where only one year’s value (2016 or 2018) existed for a country, it was used for both years; (ii) residual missingness was imputed using multivariate regression trees (CART). All features were normalized to [0,1]. Analysis design: The approach was exploratory and inductive with emphasis on generalization and interpretability. Leave-one-out cross-validation (LOOCV) was used for hyperparameter selection and performance evaluation. Model pipelines explored included combinations of: feature space (all 106 features vs. theory-based subset), feature compression (non-negative matrix factorization, NMF, with k components varied; partial least squares, PLS; or none), and model class (linear regression or Lasso for high-p settings, decision tree with tuned max-features, and random forest). Grid search tuned model and compression parameters jointly, including NMF rank k (2–8). Performance was measured by LOOCV mean absolute error (MAE) against prevalence. Model interpretation and stability: The best model was re-fit to extract and interpret NMF components via loadings. Component-level permutation importance (Altmann et al., 2010) was computed. A Rashomon set was constructed to assess explanation stability: all models within 2.5% MAE of the best model (MAE < 0.233) were retained. Variable importance was compared across Rashomon models to examine volatility and masking/interaction effects. Partial dependence and ICE plots were used to probe non-linear interactions, especially among the most influential components. Out-of-sample estimation: Using the best-performing, interpretable pipeline trained on the 70 country-year observations, the authors generated prevalence predictions for countries lacking GWP-based estimates (172 countries in 2018), applying the same data sourcing and imputation procedures for predictors. Uncertainty was illustrated via bootstrapped predictions (10,000 resamples).
Key Findings
- Best-performing pipeline: Full 106-feature set with NMF compression (k = 6) followed by a decision tree with restricted max-features. LOOCV MAE = 0.227, significantly better than mean (MAE = 0.366, p < 0.001) and median (MAE = 0.349, p < 0.001) baselines (Wilcoxon Signed-Rank). The model tended to underestimate the highest-prevalence countries. - Latent components: NMF identified six stable, interpretable components: Democratic Rule; Armed Conflict; (lack of) Physical Security of Women; Social Inequality and Discrimination; Access to Resources; Religious and Political Freedoms. This largely aligns with GSI vulnerability dimensions but adds a distinct Physical Security of Women component. - Importance in best model: Permutation importance ranked Access to Resources as the most influential predictor of national slavery prevalence in the best model. - Stability across good models (Rashomon set): Five additional models within 2.5% of the best MAE (0.228–0.232) showed that while the six-component structure was stable (k = 6 consistently optimal on full features), component importance varied: in several Rashomon models, Democratic Rule and Armed Conflict emerged as most important, contrasting with the best model’s emphasis on Access to Resources. - Component correlations and interactions: Notable correlations included Religious and Political Freedoms negatively with Armed Conflict (r = -0.62), and Physical Security of Women negatively with Access to Resources (r = -0.44). Partial dependence and ICE analyses revealed a strong non-linear interaction: poor Physical Security of Women sharply increases predicted prevalence particularly where Access to Resources (fuel, electricity, water/sanitation, education) is low. Extreme lack of women’s physical security produced the largest increases in predicted prevalence among components. - Out-of-sample estimates: The trained model generated 2018 prevalence estimates for 172 countries without survey data, alongside bootstrapped uncertainty. These estimates were compared with GSI model outputs, illustrating areas of convergence/divergence and highlighting potential data gaps.
Discussion
Findings demonstrate that an inductive, cross-validated machine-learning framework can uncover non-linear relationships and interactions among drivers of modern slavery that are difficult to capture with traditional linear, theory-first approaches. Access to Resources appears central in the best model, while the discovery and validation of a distinct Physical Security of Women component underscores the importance of gendered protection and law enforcement as a core national-level predictor. The strong interaction between women’s physical security and resource access suggests that protective interventions for women may be most impactful where basic resources and services are scarce. The Rashomon analysis highlights that multiple, similarly predictive models can yield differing explanatory narratives, cautioning against over-reliance on any single model’s importance rankings. Overall, non-linear models with feature compression outperformed linear approaches and PLS in this small n, large p setting, and latent-factor approaches appear valuable for summarizing complex predictor spaces.
Conclusion
The study introduces a transparent, interpretable workflow for small n, large p social problems, combining NMF-based feature compression, non-linear decision trees, strict LOOCV, and Rashomon-set analysis to evaluate explanation stability. It identifies six robust latent components predicting national slavery prevalence and brings forward Physical Security of Women as a novel, under-emphasized quantitative predictor with important non-linear interactions, especially under poor resource access. The approach also delivers out-of-sample prevalence estimates with uncertainty for countries lacking survey data. Future work should: expand and improve data coverage (including regions absent from current samples); incorporate richer, disaggregated measures of exploitation types; explore complementary predictive models (including more flexible non-linear or ensemble methods) while balancing interpretability; and integrate expert judgement with data-driven models to address known data gaps and contextual nuances.
Limitations
- Outcome limitations: The dependent variable is an estimate derived from surveys rather than a direct incidence measure and reflects overall risk/prevalence without typology breakdowns. - Sample representativeness: No Western Europe or North America countries in the training data, limiting generalization. - Temporal alignment: Predictor data were grouped into broad pre/post-2016 windows; some indicators used 2014 and 2019 values to predict 2016 and 2018 prevalence, assuming relative stability. - Missing data and imputation: Variables with >50% missingness were dropped; remaining gaps were imputed (including tree-based methods). Some countries reused a single year’s predictor values for both 2016 and 2018. - Small n constraints: LOOCV was used and no separate test set was held out; multiple near-optimal models exist (Rashomon effect), leading to variability in importance rankings. - Multicollinearity persists among latent components, and non-linear dependencies may not be fully captured by correlations. - Predictive performance caveat: The model tended to underestimate the highest-prevalence cases and was optimized for interpretability/explanation rather than pure prediction. - Data access: Lack of access to raw GWP microdata may limit refinement of estimates and uncertainty quantification.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny