logo
ResearchBunny Logo
Introduction
Modern slavery, encompassing forced labor, debt bondage, and sexual exploitation, affects tens of millions globally. While formally abolished in many places, it persists in various forms. The challenge of understanding its drivers is amplified by its hidden nature and the resulting data scarcity. Existing research, largely qualitative and based on small samples, provides valuable insights but limited generalizability. Previous quantitative studies have relied on simplified linear models, neglecting the complex interactions between numerous potential factors. This "wicked problem" necessitates sophisticated analytical approaches capable of handling high dimensionality and non-linear relationships. This study introduces a novel machine learning methodology to address these challenges and quantify the impact of individual factors, even with limited data.
Literature Review
The literature identifies several factors contributing to modern slavery. At the individual level, vulnerability is associated with poverty, lack of resources, migration status, and illiteracy, particularly impacting children. At the national level, low GDP, corruption, and armed conflict are linked to higher slavery prevalence. Regionally, factors such as geographic location, natural disasters, and disease outbreaks have been implicated. While existing quantitative work exists, limitations include primarily linear models and small variable sets due to data constraints. This study aims to improve upon these limitations by applying advanced machine-learning techniques.
Methodology
The study utilizes country-level prevalence estimates from the Gallup World Poll (GWP) for 2016 and 2018, covering 48 countries, representing 70 data points. A total of 106 independent variables were selected from open-source datasets, encompassing economic, social, demographic, and contextual indicators. Data preprocessing involved handling missing data through imputation and normalization. The methodology adopted an exploratory, inductive approach evaluating multiple modeling strategies. The researchers compared the performance of models using the full feature set (106 variables) with a reduced set (34 variables selected from literature), and different feature compression techniques such as Non-negative Matrix Factorization (NMF) and Partial Least Squares (PLS) Regression. Three model classes—linear regression, decision trees, and random forests—were evaluated using leave-one-out cross-validation (LOOCV) to optimize model parameters and ensure generalizability to the entire dataset. To address the potential for multiple well-performing models (Rashomon effect), the authors examined a Rashomon set of models within a close performance threshold of the best model, evaluating the stability of variable importance across this set. The best model was then leveraged to generate out-of-sample prevalence estimates for countries without GWP survey data, comparing these predictions to those from the Global Slavery Index (GSI).
Key Findings
The best-performing model utilized the full feature set (106 variables), NMF component compression (k=6), and a decision tree with restricted maximum features. This model significantly outperformed baseline predictions (p<0.001). The NMF components were interpretable as: Democratic Rule, Armed Conflict, (lack of) Physical Security for Women, Social Inequality and Discrimination, Access to Resources, and Religious and Political Freedoms. Permutation importance analysis revealed "Access to Resources" as the most significant predictor in the best model. Rashomon set analysis showed that, while the NMF components were stable across the set of well-performing models, their relative importance varied considerably, highlighting the limitations of interpreting a single model. Importantly, the analysis revealed a non-linear interaction between "Physical Security of Women" and "Access to Resources," indicating that women's vulnerability to exploitation increases significantly in areas with low resource access. Out-of-sample predictions for countries without survey data were generated using the best model, and these were compared to the GSI estimates.
Discussion
The findings address the research question by identifying key predictors of modern slavery, going beyond traditional linear approaches to capture non-linear relationships and interactions among variables. The discovery of the previously underemphasized importance of "Physical Security of Women," particularly in resource-scarce contexts, provides novel insights for policy interventions. This inductive, data-driven methodology is valuable in uncovering unexpected predictors and complex relationships, which might be missed by traditional approaches. Comparing out-of-sample predictions with the GSI highlights the value of transparent, data-driven methods that quantify uncertainty, promoting more informed policy discussions. The study demonstrates the power of machine learning in analyzing complex societal problems, generating both improved predictions and more nuanced understandings of underlying mechanisms.
Conclusion
This study presents a novel machine-learning approach for understanding the complex drivers of modern slavery, overcoming challenges of data scarcity and high dimensionality. The findings highlight the importance of considering a country's capacity to protect women's physical security, particularly in resource-poor settings. The methodology provides a robust framework for future research into "wicked problems", combining data-driven analysis with expert knowledge to generate both accurate predictions and detailed explanations. Future research should focus on integrating additional data sources (satellite imagery, mobile phone data) to enrich the datasets used in this type of analysis.
Limitations
The dependent variable relies on survey-based estimates, representing a proxy for actual slavery prevalence. The analysis is limited to data from 48 countries, mostly excluding Western Europe and North America, limiting generalizability to these regions. Data imputation, although carefully conducted, can introduce uncertainty. Out-of-sample predictions, while valuable, are subject to limitations inherent in extrapolating models beyond their original training data. The selection of the epsilon-value for the Rashomon set is subjective, though the analysis showed a clear performance gap justifying the choice.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny