logo
ResearchBunny Logo
Exploring socioeconomic similarity-inequality: a regional perspective

Economics

Exploring socioeconomic similarity-inequality: a regional perspective

M. L. Mouronte-lópez and J. S. Ceres

Discover how Mary Luz Mouronte-López and Juana Savall Ceres explore economic and social inequalities across global regions using machine learning and time-series data. Their innovative approach reveals critical relationships among key socioeconomic variables, aiming to influence policies that foster equality and sustainable development.

00:00
00:00
Playback language: English
Introduction
Analyzing and monitoring socioeconomic characteristics of geographical areas is crucial for developing effective regional and national policies. This study is motivated by the United Nations' 2030 Agenda for Sustainable Development, which emphasizes reducing inequality across various dimensions. Economic research underscores the need to evaluate countries not only by economic productivity but also by their poverty reduction and inequality-decreasing capabilities. Inequality manifests in diverse ways, requiring a multifaceted analysis encompassing income, consumption, health, education, gender, and justice. Quantifying both similarity and inequality between regions is essential for defining improvement strategies that contribute to achieving the Sustainable Development Goals (SDGs). This research employs mathematical analysis, statistics, and machine learning to examine variables and indices related to socioeconomic aspects, aiming to identify similarities and inequalities between countries and regions globally, uncover relationships between variables, and build a predictive model for the Gini coefficient.
Literature Review
Existing research on similarity analysis often focuses on differences rather than similarities. Studies have explored similarities across social variables within various categories (gender, age, etc.), revealing surprisingly high levels of similarity. Research in health has highlighted the importance of analyzing similarities between marginalized groups to understand and address healthcare inequalities. Several studies have analyzed regional similarities from different perspectives, focusing on macroeconomic trends, comparing states within the US, and exploring similarities between EU countries. Previous research has also examined the Gini coefficient and its relationships with various factors. While some studies have focused on the Gini coefficient's relationship with specific factors like income distribution, others have highlighted the limitations of relying solely on the Gini coefficient, suggesting that multi-parametric models provide a more comprehensive understanding of inequality. Prior research has explored the Gini coefficient's applications in diverse fields, including fisheries, rainfall analysis, and healthcare, but its use in social research has not been fully exploited. Finally, prior research used various machine learning techniques to model the Gini coefficient, with different levels of success.
Methodology
The study utilizes data from several international repositories, including the World Bank's Gender Statistics Database and Gini Coefficient Dataset, and the United Nations' Gender Inequality Index. The R programming language was employed for data analysis and model building. Exploratory data analysis was conducted first, involving visualization and identification of missing values. The study then used clustering techniques, specifically, series clustering with Ward's algorithm, to group countries based on the time series of various socioeconomic variables. Multiple indices were utilized to determine the optimal number of clusters. To assess regional similarity and inequality, metrics were developed based on the frequency of countries appearing together in the same clusters. Furthermore, a supervised learning approach, specifically a random forest model, was employed to predict the Gini coefficient based on several socioeconomic indicators from various domains (education, economic, labor market, and gender). The random forest model was selected as it has previously demonstrated robust performance. A cross-validation procedure with five folds was implemented to estimate the model's generalization ability. Variable importance was evaluated using node purity and permutation importance metrics to select the most influential variables in the predictive model.
Key Findings
The analysis of time series revealed that several countries exhibited stationary behavior for certain variables, implying that their means could be used for future prediction. Correlations among various socioeconomic variables were calculated using Spearman's method (due to non-normality). Findings indicated moderate to high correlations between various educational variables (e.g., gender parity index across education levels), as well as correlations between education levels and other socioeconomic domains. Clustering analysis resulted in varying numbers of clusters for each variable across different domains. The Gini coefficient (SI.POV.GINI) revealed three distinct clusters, suggesting considerable heterogeneity in income inequality across countries. Regional similarity and inequality metrics demonstrated substantial differences in socioeconomic similarities and inequalities across regions. Europe showed the highest internal similarity in the economic domain, while North America displayed the highest inequality. Significant regional differences were also observed in education, labor market, and gender domains. The random forest model successfully predicted the Gini coefficient, with 16 variables found to be most influential. These variables spanned various domains: health (9 variables), economics (2), social labor protection (4), and gender (1). Notably, the model demonstrated an average RMSE of 3.55701, representing a reasonable balance between the number of variables and predictive accuracy.
Discussion
The findings highlight significant regional disparities in socioeconomic indicators. Europe's high economic similarity is likely due to its regional policies, while North America showed the highest inequality. South America and Europe exhibited the greatest similarities in economic and educational domains, potentially due to past policy initiatives. The analysis of correlations between variables provided insights into the interrelationships between educational attainment, economic indicators, and labor market outcomes. These findings are mostly in line with the existent literature, providing supporting evidence. The successful Gini coefficient prediction model underscores the importance of considering a wider range of variables beyond the purely economic indicators to achieve a more comprehensive understanding of inequality. The model's performance suggests that factors such as health, social protection, and gender play crucial roles in shaping income inequality.
Conclusion
This research contributes a novel approach to studying global socioeconomic similarities and inequalities through time-series analysis and machine learning. The findings reveal significant regional disparities and highlight the complex interrelationships between socioeconomic variables. The developed predictive model for the Gini coefficient offers a powerful tool for understanding and addressing income inequality. Future research could focus on a more in-depth study of the gender domain, given its significant regional variability, and on analyzing the random and trend components of the time series separately to obtain a more granular understanding of the underlying dynamics.
Limitations
The study's reliance on existing datasets may introduce limitations related to data availability and quality. Although many countries are included, data may not always be completely representative across all regions. The selection of variables and the use of a specific machine learning algorithm might have influenced the results. Further research would benefit from using more sophisticated machine learning techniques and expanding the dataset with additional variables.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny