logo
ResearchBunny Logo
Identifying the roots of inequality of opportunity in South Korea by application of algorithmic approaches

Sociology

Identifying the roots of inequality of opportunity in South Korea by application of algorithmic approaches

S. Han

This study by Seungwoo Han explores the roots of inequality of opportunity in South Korea using innovative algorithmic approaches. By evaluating survey data, the research highlights critical factors such as the region of upbringing, gender, and father's occupation, uncovering the regional disparities and the continuation of gender inequality.

00:00
00:00
~3 min • Beginner • English
Introduction
Following rising inequality after the 1997 Asian financial crisis, inequality has become a prominent social and political issue in South Korea, particularly among millennials concerned about unequal opportunities. This study seeks to identify the roots of inequality of opportunity (IOp) in South Korea using algorithmic approaches. Grounded in Rawlsian justice and Roemer’s framework distinguishing circumstances (beyond individual control) from effort (individual responsibility), the study focuses on ex-ante IOp, that is, inequality between groups defined by shared circumstances. Unlike prior work that measures the magnitude of IOp, this study aims to uncover which circumstances most structure unequal opportunities. The authors propose using decision tree classification, LightGBM, and SHAP to estimate variable importance and interpret model outputs. The purpose is to provide policy-relevant insight into which childhood circumstances most predict adverse socio-economic outcomes (proxied by sub-minimum wages) among a specific cohort (millennials), thereby informing strategies to enhance equality of opportunity.
Literature Review
The background situates equality of opportunity within egalitarian justice (Rawls, Sen, Dworkin, Arneson, Cohen), emphasizing fair starting conditions rather than equal outcomes. Roemer and Fleurbaey formalized IOp measurement by separating circumstances from effort and proposing ex-ante (between-type) and ex-post (within-effort) approaches. Empirically, parametric models risk down/upward bias and model selection issues, while nonparametric partitioning can suffer from arbitrary segmentation and small-sample overestimation. Decision trees have been proposed as data-driven, nonparametric tools aligned with Roemer’s theory and used for IOp estimation across regions (e.g., Brunori et al.). In the Korean context, generational experiences diverge markedly; millennials report strong perceptions of unfair opportunity distribution (e.g., “spoon class” discourse). Prior Korean studies link parental education/income to offspring outcomes, document regional disparities in services and development, and persistent gender gaps in opportunity. Housing type and tenure also proxy economic status. This literature motivates focusing on youth circumstances (around age 14) such as region, gender, parental background, family structure, and housing as key determinants of subsequent socio-economic outcomes.
Methodology
Empirical approach: The study adopts an ex-ante utilitarian perspective, examining between-type inequality by classifying individuals into types based on observed childhood circumstances and assessing whether group means fall below or above a socio-economic threshold. Socio-economic achievement is proxied by wage; the adverse condition is earning below the minimum wage in 2017. Data: Youth Panel Survey (Korea Employment Information Service). Baseline in 2007 surveyed nationwide males and females aged 15–29 using multi-stage area probability sampling; follow-up in 2017 collected current wages from the same respondents. In 2007, respondents reported circumstances around age 14. Dependent variable: binary indicator of 2017 wage relative to minimum wage (6470 KRW/hour, equivalent to 1,352,230 KRW/month at 209 hours). Circumstance variables include: region lived at ~age 14; respondent’s gender; living with parents; father’s and mother’s job; father’s and mother’s occupational position; father’s and mother’s education; physical presence of parents; number of working parents; number of siblings; housing tenancy status (owned/jeonse/monthly rent); housing type. Python 3.7 and scikit-learn 0.22.2 were used. Type classification and thresholding: Using the between-type approach, for each type defined by shared circumstances, the mean outcome is computed and then binarized relative to the minimum wage (below vs. at/above). This establishes a binary classification target consistent with analyzing the most adverse socio-economic condition. Models: Two tree-based classifiers were used: (1) decision tree classification (CART) and (2) LightGBM (a gradient boosting decision tree ensemble). Decision trees split data to maximize purity (e.g., Gini or cross-entropy) and inherently provide variable importance via split gains, but can be unstable and prone to overfitting. LightGBM uses gradient boosting with leaf-wise growth and Gradient-based One-Side Sampling (GOSS) to improve efficiency and accuracy, mitigating overfitting and instability relative to a single tree. Interpretability: SHapley Additive exPlanations (SHAP) were applied to provide consistent, theoretically grounded feature attributions both globally (average absolute SHAP values across the dataset) and locally. SHAP summary plots were used to assess variable rankings and directions of influence (positive/negative impact on the probability of being at/above the minimum wage). Evaluation: Data were split into training and test sets (80/20). Predictive performance was evaluated using accuracy, precision, recall, F1, and ROC-AUC to compare model stability and discrimination, recognizing that the study’s main objective is interpretability and variable importance rather than pure prediction accuracy.
Key Findings
- Variable importance: Across models, the top contributors to inequality of opportunity were region (childhood residence), gender, and father’s job, with region exerting the largest impact and intensity on outcomes. Father’s background (job and education) had stronger influence than mother’s background. - SHAP interpretation (LightGBM): - Region: Largest positive/negative impact range, indicating substantial regional disparities in opportunity and outcomes. - Gender: Male generally associated with positive direction; female with negative, evidencing gendered opportunity differences. - Father’s job: Greater impact than mother’s job; certain high-value categories for mothers (e.g., housewife/retired at high value) showed positive effects in limited cases. - Education of parents: Father’s education shows a positive effect around college-level; very high values sometimes associated with negative marginal impact in the plot’s extremes. - Tenancy status: “Owned” housing linked to positive outcomes; lower-status tenancy linked to negative direction. - Family composition: Fewer siblings (below the average of 2.3) associated with positive outcomes; two working parents slightly more positive than otherwise. - Model comparison (test set performance): - Decision tree: Accuracy 0.8636; Precision 0.9386; Recall 0.9144; F1 0.9263; ROC-AUC 0.5049. - LightGBM: Accuracy 0.9360; Precision 0.9406; Recall 0.9947; F1 0.9669; ROC-AUC 0.6654. LightGBM demonstrated higher and more stable performance across all metrics and produced SHAP summaries that were clearer for interpretation than the single decision tree.
Discussion
The findings directly address the research objective by identifying specific childhood circumstances that most structure unequal opportunities in South Korea. The dominance of region underscores how spatial contexts (e.g., Seoul metropolitan vs. rural provinces; within-Seoul disparities like Gangnam vs. Gangbuk) shape access to resources and opportunities, leading to divergent socio-economic trajectories. The strong role of gender suggests persistent gendered barriers despite improvements in participation and wage gap statistics, indicating that opportunities remain unevenly distributed. The outsized influence of the father’s job and education relative to the mother’s background reflects enduring patriarchal and labor-market structures in South Korea, whereby paternal status channels advantages or disadvantages to offspring. Methodologically, combining tree-based models with SHAP reveals not just which variables matter but how their values shift the probability of achieving at least the minimum wage. LightGBM’s stability and superior discrimination support the reliability of its importance rankings. Substantively, the results imply that policy interventions aiming to equalize opportunities should prioritize mitigating regional disparities, advancing gender equality, and reducing the intergenerational transmission of advantage tied to paternal occupations and education.
Conclusion
Using tree-based classification models and SHAP on Korean Youth Panel data, the study identifies region, gender, and father’s job as the principal roots of inequality of opportunity among South Korean millennials, with region exerting the largest impact. LightGBM provided more stable and reliable interpretability than a single decision tree. The results indicate that individuals experience unequal opportunities due to the combined effects of where they grew up, their gender, and their father’s background, which significantly shape socio-economic achievement. Policy efforts to promote equality of opportunity should therefore target spatial inequalities, strengthen gender equity, and weaken intergenerational transmission of advantage. Future research should: (1) further articulate theoretical links between IOp frameworks and algorithmic methods; (2) explore alternative socio-economic criteria beyond minimum wage; and (3) conduct more granular analyses of regional stratification and disparities.
Limitations
- The study’s algorithmic application to IOp is preliminary; further work is needed to strengthen the linkage among theory, empirical strategy, and machine-learning methods. - The outcome threshold is based on the 2017 minimum wage; other criteria of socio-economic achievement were not examined and may yield different insights. - While region emerged as most influential, the study does not resolve how regions are internally stratified or quantify inter-regional disparities; detailed regional analysis is beyond scope and left for future research.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny