Interdisciplinary Studies
Developing China's workforce skill taxonomy reveals extent of labor market polarization
W. Xu, X. Qin, et al.
The study investigates how skills are distributed across China’s workforce and regions, and whether China’s labor market exhibits polarization between socio-cognitive and sensory-physical skills, similar to patterns documented in the US and Europe. Technological change complements some skills while substituting routine tasks, reshaping wages and employment. While the US has ONET to quantify occupational skills, China lacks a comparable taxonomy, limiting research and policy responses to automation and structural change. The authors aim to build China’s first workforce skill taxonomy by mapping Chinese occupations to ONET skills using a Naïve Bayes approach, quantify polarization in skills, and construct city-level skill profiles to explain regional economic inequality and differences in migration attractiveness. Given China’s heterogeneous industrial structure and rapid automation, understanding city skill endowments is important for policies on reskilling and regional development.
Prior research shows routine tasks are most susceptible to automation, with non-routine tasks complemented by technology, contributing to rising wage inequality and job polarization in the US and Europe. The US ONET skill database has enabled analyses of changing task composition, earnings inequality, and AI impacts on labor. A recent US study revealed polarization into two skill clusters: social-cognitive (high-income jobs) and sensory-physical (low-income jobs). In China, evidence on polarization has been mixed due to limited occupation-specific data, with studies using macro or survey data reaching conflicting conclusions. This paper leverages ONET to fill China’s data gap and reassess polarization and regional disparities through a skill-based lens.
Data sources and skill definition: The study uses ONET 23.0 to characterize 161 skills, abilities, knowledge, and work activities for occupations, grouped into 15 categories. Skill importance scores (1–5) across ONET occupations are converted to revealed comparative advantage (RCA) to determine whether a skill is effectively used by an occupation (RCA > 1 => effective). Chinese occupational data comes from the National Occupation Classification Code (NOCC), which provides titles and job descriptions but not skills; 434 minor classes are the unit categories, matched to 353 occupations for analysis consistent with the 2010 census.
Task–skill mapping via Naïve Bayes: From ONET occupation descriptions, the authors extract K=1273 task tokens and use I=161 skills to build a task–skill relationship matrix based on mutual information of co-location across ONET occupations. A tripartite network links 1273 tasks, 161 skills, and 696 ONET occupations. Using a Naïve Bayes model, they infer P(Skill|Chinese Occupation) by decomposing P(Chinese Occupation|Skills) as the product over task likelihoods given skills, leveraging the learned task–skill relationships from ONET and assuming tasks are transversal between US and China. This yields a binary Chinese skill taxonomy indicating whether each of the 161 skills is effectively used by each Chinese occupation. Examples confirm face validity (e.g., sewing workers require Finger Dexterity, Equipment Maintenance, Near Vision; entrepreneurs do not).
City skill profiles: For each city, they compute two measures: (1) the number of skills effectively used (skill diversity) and (2) a socio-cognitive score. First, construct a 353×161 occupation–skill binary matrix e(o,s). For each city c, compute CS(c,s) = sum over occupations of census employment in c times e(o,s). Compute city-level RCA for each skill and count how many skills have RCA ≥ 1 to get Skills_c (skill diversity). For socio-cognitive content, detect two skill communities (socio-cognitive vs sensory-physical) via community detection on the skill space; define an occupation’s socio-cognitive level as the fraction of its 97 socio-cognitive skills it effectively uses. Occupations with socio-cognitive level > 0.6 are labeled socio-cognitive (robust to 0.7 and 0.8 thresholds). A city’s socio-cognitive score is the share of its employment in socio-cognitive occupations.
Skill space construction: Define proximity between skills as the minimum of their co-occurrence probabilities across occupations, constructing a network to reveal clusters and bridging skills.
Economic analysis: Regress city GDP per capita (2010) on controls (capital per capita, population density) and compare the explanatory power of education (share with university degree) versus skill profile variables (skill number, socio-cognitive score). Additional analyses include per capita wages.
Migration modeling: Adapt the radiation model for inter-city migration by replacing total employment with counts of college-educated workers or skilled workers (from the skill taxonomy) as proxies for job opportunities. Validate predictions against Baidu Map daily city-to-city migration data (24 July 2019), using NDCG on top-10 destination rankings. Statistical tests compare model variants (skilled workers vs total employment vs college-educated).
- Polarized skill structure: Community detection on the skill space reveals two clusters: 97 socio-cognitive skills (knowledge, social, cognitive) and 64 sensory-physical skills (physical, sensory, work output). Bridging skills include Mathematics, Judging the Qualities of Things/Services/People, and Estimating Quantifiable Characteristics.
- Occupational skill breadth and content: Across 353 Chinese occupations, the number of important skills varies widely (<40 to >80). Technicians and Professionals require the highest average number of skills (≈71), while Manufacturing Workers and Agricultural Workers require the fewest (≈55 and 58). White-collar groups rely more on social, mental process, and complex problem-solving skills; blue-collar groups rely more on psychomotor abilities, work output, and technical skills.
- Wages and socio-cognitive content: Occupations with higher socio-cognitive scores correspond to higher wages at the major-group level: Managers ≈131,929 CNY/year; Professionals ≈83,148 CNY/year; Business and Service Workers ≈49,502 CNY/year; Manufacturing Workers ≈50,703 CNY/year.
- City skill profiles: The number of skills effectively used by cities ranges from 55 to 103 (out of 161), with most between 70 and 80. Socio-cognitive scores vary widely: e.g., Beijing 0.52, Guangzhou 0.43, Putian 0.22, Nanyang 0.09. Cities with higher administrative rank (“sub-provincial or above”) tend to have higher socio-cognitive scores and GDP per capita.
- Economic performance: Regression results show skill profile variables outperform education in explaining GDP per capita. Baseline model with capital per capita and population density: R²≈0.61. Adding skill number: R²≈0.693. Adding university degree: R²≈0.645. Adding socio-cognitive score: R²≈0.761. Combined model including socio-cognitive score and university degree: R²≈0.777, with socio-cognitive score remaining significantly positive while university degree is not.
- Migration prediction: Radiation model using skilled worker counts (NDCG mean ≈0.65) or college-educated counts (≈0.67) predicts top-10 migration destinations better than using total employment (≈0.61). Skilled vs total employment is significantly better (p=0.02); skilled vs college-educated is not significantly different (p=0.19). Example: For Tianjin, the skilled-worker model correctly ranks Beijing as the top destination, matching observed data, outperforming the total-employment model.
The constructed Chinese workforce skill taxonomy, mapped from O*NET via a Naïve Bayes framework, enables quantification of occupational skills and reveals a pronounced polarization between socio-cognitive and sensory-physical skills in China. This polarization manifests both across occupations (white-collar vs blue-collar) and geographically across cities. The findings support the hypothesis that socio-cognitive skill intensity relates to higher wages and stronger economic performance, and they highlight bridging skills—particularly Mathematics—that facilitate potential transitions between clusters. Unlike the US, where smaller cities often have lower socio-cognitive levels, China’s legacy of central planning and administrative hierarchies leads to even some large cities exhibiting low socio-cognitive skill shares, reinforcing regional inequality. City-level skill profiles (diversity and socio-cognitive share) explain GDP per capita better than education levels alone, suggesting that the composition and diversity of skills capture dimensions of economic capacity and resilience beyond traditional human capital metrics. The adapted radiation model shows that skilled population size better proxies job opportunities and city attractiveness than total employment, aligning migration patterns with skill concentrations. These insights inform policies on reskilling, education (including STEM and vocational training), and regional development strategies aimed at transitioning workers and cities away from dependence on sensory-physical skill clusters.
This study develops and releases China’s first workforce skill taxonomy and an online tool (skills.sysu.edu.cn), mapped to O*NET, enabling quantitative measurement of occupational skills in China. The taxonomy uncovers strong polarization between socio-cognitive and sensory-physical skills and demonstrates that city skill profiles—particularly socio-cognitive intensity and skill diversity—outperform education in explaining economic performance. The skill space identifies bridging skills (e.g., Mathematics) critical for career mobility and suggests targeted reskilling pathways. Findings imply growing geographic inequality tied to uneven socio-cognitive skill distributions and emphasize the need for inclusive education and vocational training to support worker transitions and enhance city resilience. Future research could compute detailed career pathways using the skill space, evaluate policy interventions for reskilling at scale, and extend validation with longitudinal migration and wage data.
- Occupational wage data are only available at the major-group level and use a different coding scheme than NOCC, limiting granularity and exact comparability.
- The socio-cognitive occupation classification relies on a threshold (0.6) for the share of socio-cognitive skills, though robustness checks at 0.7 and 0.8 yield similar results.
- The radiation model validation uses a single day (24 July 2019) of Baidu Map migration data and only the top-10 destinations per source city, constraining evaluation breadth.
- The tripartite mapping leverages O*NET-derived task–skill relationships assumed to generalize to China; any cross-country differences in task semantics may introduce bias.
- Career pathway computation is noted as beyond the scope, limiting direct guidance on individual-level transitions.
- City-level analyses are aligned to the 2010 census; temporal changes in occupational structure and skills after 2010 are not captured in the main estimates.
Related Publications
Explore these studies to deepen your understanding of the subject.

