Computer Science

Market-oriented job skill valuation with cooperative composition neural network

Y. Sun, F. Zhuang, et al.

This innovative research by Ying Sun, Fuzhen Zhuang, Hengshu Zhu, Qi Zhang, Qing He, and Hui Xiong presents a groundbreaking data-driven solution to evaluate job skill value from a market perspective. By introducing the Salary-Skill Composition Network (SSCN), the study reveals how job salaries relate to the context-aware value of required skills, outperforming traditional benchmarks in salary prediction.

00:00

~3 min • Beginner • English

Index

Introduction

In the era of the knowledge economy, understanding the value of job skills is crucial for individuals and employers. Prior work shows a positive association between skill mastery and salary; however, traditional market surveys lack fine-grained, up-to-date, and context-aware assessment. With the growth of online recruitment platforms, large-scale job advertisement data enables data-driven labor market intelligence. Yet most studies focus on modeling skill demand rather than quantifying skill value in terms of its influence on salary. The research question addressed here is how to quantitatively assess the value of individual job skills in a market-oriented, context-aware manner, despite the absence of labeled ground truth for skill value and the fact that job postings list multiple, interacting skills. The authors define skill value as the expected salary of a job that only requires a given skill at a specified mastery level under a given job context (e.g., company, time, location, required experience). They posit that job salary can be treated as a composition of context-aware skill values and that salary prediction can provide indirect supervision for learning skill values. To solve this, the paper formulates a Salary-Skill Value Composition Problem and introduces a cooperative neural network, SSCN, that simultaneously learns a context-aware skill valuation model and a salary prediction model by composing estimated skill values into job salaries. This framework enables separation of skills and context, interpretability via value and domination factors, and quantitative analysis from large-scale job postings.

Literature Review

The paper reviews evidence of a positive association between skill mastery distributions and salary outcomes across global labor markets. Traditional survey-based approaches are limited by dynamic market conditions, lack of granularity, and delayed updates. Recent availability of online job advertisements has driven labor market intelligence and skill analysis, but prior work largely emphasizes skill demand modeling, trend-aware factorization, and benchmarking rather than direct, quantitative valuation of skills by their impact on salary. Salary benchmarking methods (e.g., matrix factorization) and text-mining models have been explored for salary prediction, yet they typically do not disentangle individual skill values under varying contexts. This study addresses that gap by proposing a market-oriented, context-aware definition of skill value and a cooperative neural framework to infer it from job postings using salary as indirect supervision.

Methodology

Problem definition: Define the value of a skill as the expected salary of a hypothetical job requiring only that skill at a specified mastery level under a given context (company, time, location, required experience, etc.). Let f(s, lv, C) estimate the skill value v for skill s at level lv under context C. Because postings list multiple skills and lack explicit labels of single-skill values, the authors leverage the intuition that salary reflects a composition of required skills’ values. Formulation: Given job postings J = {(C_j, S_j, Y_j)}, where C_j is context set, S_j = {(s_i, lv_i)} is the required skills with levels, and Y_j is the salary range, jointly learn (1) a context-aware skill valuation model f: (skill, context) → value and (2) a skill-based salary prediction model g: ⟨skill, value⟩ → salary. The salary is modeled as a composition (linear weighted average) of skill values with learnable domination weights to maintain interpretability. Model: Salary-Skill Composition Network (SSCN) comprises two components: 1) Context-aware Skill Valuation Network (CSVN): Estimates per-skill value as a range (non-negative lower and upper bounds) under context. CSVN uses temporal skill embeddings to capture dynamics over time, with a low-rank factorization and temporal regularization to avoid abrupt changes. It extracts context–skill interactions via a DeepFM-like design: linear projection (first-order), multiplicative operations (second-order), and an MLP (higher-order). Bounds are enforced by architecture: lower bound v = ReLU(linear(...)), upper bound v' = v + p with p = ReLU(linear(...)). CSVN outputs both lower and upper value bounds and also provides intermediate representations for domination modeling. 2) Attentive Skill Domination Network (ASDN): Models skill domination (weights) for composing values into salary. Each job’s skills form a graph (nodes: skills; edges: co-appearance relations weighted by normalized co-occurrence). ASDN uses CSVN features and computes two representations per skill: importance and influence. Local influence is learned with a GCN over the skill graph; global influence is the mean of influence vectors. An attention mechanism with query (global), keys (concatenation of local influence and importance), and softmax yields domination scores. Separate attention parameters are learned for lower and upper salary bounds to allow differing duty allocations. Salary composition and loss: For job i with N skills, predicted bounds are ŷ_i = Σ_j v_j a_j^(l) and ŷ'_i = Σ_j v'_j a_j^(u). Loss is L_s = Σ_i λ1|ŷ_i − y_i| + λ2|ŷ'_i − y'_i| plus temporal regularizer βΣ_t ||v^(t) − v^(t−1)||_F^2. The model is trained end-to-end with Adam, residual connections, Leaky ReLU, and Glorot initialization. λ1=2, λ2=1, β=0.004. Data and preprocessing: IT-related job postings were collected from Lagou (https://www.lagou.com/), over 800,000 postings from July 2016 to June 2019. After preprocessing (filter to full-time, top 16 cities covering >90% data, remove salary outliers by boxplot, top 1000 companies, structured extraction of 14 level words and 1,374 skills, skill co-occurrence graph with thresholding and normalized weights), 215,308 postings remained. Context features include both continuous and discrete variables; details in Supplementary. A designer-related dataset was also used for supplementary validation. Baselines and validation: Compared against LR, SVM, GBDT; a DNN with similar depth/variables; HSBMF salary benchmarking; text models (TextCNN, HAN, Transformer-XL) with Chinese embeddings; and pretrained models (BERT, RoBERTa, XLNet). Ablations: CSVN+Mean (replace ASDN by mean pooling) and SSCN (Independ) (predict bounds independently). Conducted 10 repetitions of 4:1 hold-out validation; evaluate RMSE and MAE on both salary bounds.

Key Findings

Dataset and setup: 215,308 IT job postings (2016-07 to 2019-06) after preprocessing; additional designer dataset for supplementary checks. Skill valuation under contexts: - Mastery levels: CSVN distinguishes level impacts; most levels affect both bounds similarly; sophisticated levels raise value more. Lowest level Can Read decreases value by ~10%; Versatile increases value by ~10% on average. Bias instances causing atypical estimates account for only 0.96% of samples. - Temporal trends: Architecture exhibits steady growth: average value 21.8 K RMB in 2016-H2, rising ~5% per half-year to 27.6 K RMB in 2019-H1. Some hot skills (GoLang, Recommender System) are volatile; e.g., GoLang decreased 26% in 2019-H1 (28.2 K → 20.8 K RMB). Many high-value skills declined in early 2019, possibly due to China’s “Internet Winter.” Designer skills were more stable. - Experience effects: Longer experience raises skill value; 10 years of experience increases value by ~2.5× over graduates. Growth patterns differ: Architecture and Project Management rise slowly initially, then faster after 3–5 years. Example values: Algorithm 12.8 K RMB (graduates); Project Management 10.2 K RMB (graduates). With 1–3 years, Machine Learning 24.2 K vs Architecture 19.9 K RMB; ranking reverses after 5 years. - Company differences: Skill valuations vary by company. ByteDance values Architecture and Algorithm similarly and values Python (23.9 K) above Java (21.0 K), unlike others. JD.com shows larger spread for Java (IQR gap 13 K vs 7 K elsewhere), implying higher potential salary growth. Baidu shows more stable, comprehensive skill valuations. Salary prediction performance (mean ± std over 10 hold-outs): - SSCN achieves best results: Lower RMSE 4.435 ± 0.061, MAE 3.244 ± 0.048; Upper RMSE 7.686 ± 0.086, MAE 5.627 ± 0.060. Compared to BERT, SSCN reduces RMSE by ~3.5% (lower) and ~5.2% (upper). Ablations show performance drops with mean pooling (CSVN+Mean) and when predicting bounds independently (SSCN (Independ)), confirming the benefits of ASDN and joint range prediction. Domination vs value and influence: - Generic skills tend to have higher domination; specific skills have higher value. Examples: Unsupervised Learning domination 37.8%, Multivariable Regression 46%; Graph Algorithm domination 18.2% with value 35.2 K RMB. The trade-off implies breadth aids employability while depth raises salary; Topic Model shows high average contribution (8.5 K RMB). - Skill influence on salary (drop-one analysis): High value and high domination yield high influence. Examples from Table 3: Matrix Calculation (value 32.306 K, domination 25.2%) causes +18.4% average salary decrease when removed; POS Analysis +15.6%; Information Theory +13.7%; Computational Linguistics +12.0%; Voiceprint Recognition +8.8%; PLSA +4.7%; XGBoost +3.6%.

Discussion

The study addresses the lack of labeled skill-value data by leveraging salary as indirect supervision in a cooperative architecture. By defining salary as a composition of context-aware skill values and learnable domination weights, SSCN disentangles the roles of individual skills and their interactions via domination under varying contexts. The method yields interpretable outputs—per-skill value ranges and domination—that explain how skills contribute to salary and allow analysis across time, experience, and companies. Empirical results confirm that the model produces meaningful, context-aware valuations and improves salary prediction accuracy over strong baselines, including pretrained language models. The findings (e.g., level effects, temporal volatility of hot skills, experience-driven value trajectories, and company-specific preferences) are consistent with labor market intuition, offering actionable insights for job seekers, employers, and educators. The domination–value trade-off highlights how generic foundations drive employability while specific expertise boosts salary, aligning with modern educational emphasis on transferable skills. Overall, the cooperative design demonstrates a viable strategy for learning latent, interpretable quantities (skill values) from related supervised tasks (salary prediction), with potential generalization to other domains with indirect supervision.

Conclusion

This work introduces a market-oriented, context-aware definition of skill value and formulates the Salary-Skill Value Composition Problem. The proposed SSCN jointly learns a skill valuation model (CSVN) and a salary prediction model with domination modeling (ASDN), composing per-skill values into salary via attention over a skill co-occurrence graph. Trained on large-scale job postings without labeled skill values, SSCN delivers accurate salary prediction and interpretable, context-aware skill valuations. Main contributions include: (1) a principled formulation linking skill value to salary under context; (2) a cooperative neural architecture with constrained value range modeling, temporal skill embeddings, and graph-attentive domination; (3) extensive empirical validation demonstrating superior predictive performance and rich insights into skill dynamics across levels, time, experience, and companies. Potential applications span recruitment (salary reference and competitiveness), market analysis (trend tracking), education (curriculum planning using experience-aware values), knowledge management and talent development (company-specific training), and job recommendation (company–skill fit). Future directions include expanding to more diverse and comprehensive datasets to mitigate bias and enhance generalizability, integrating additional data sources and collaborations for empirical validation of skill values, and exploring methods to further reduce bias from imbalanced data distributions.

Limitations

Data limitations include reliance on a single major Chinese online recruitment platform and two datasets (IT and designer), constraining generalizability and potentially introducing bias; and the relatively short historical span, limiting long-term trend analysis. There is no ground-truth for skill value, so validation is indirect via salary prediction performance. Some biases may occur due to imbalanced data (e.g., specific skill–level contexts such as “Know JavaScript”); although rare (~0.96% of samples), broader and more diverse data could alleviate such biases. The model captures temporal dynamics via embeddings but is not a forecasting model, as each time period requires training data.

Related Publications

Explore these studies to deepen your understanding of the subject.

Mathematics

An exact mathematical description of computation with transient spatiotemporal dynamics in a complex-valued neural network

R. C. Budzinski, A. N. Busch, et al.

Physics

Neural network enabled nanoplasmonic hydrogen sensors with 100 ppm limit of detection in humid air

D. Tomeček, H. K. Moberg, et al.

Agriculture

Training instance segmentation neural network with synthetic datasets for crop seed phenotyping

Y. Toda, F. Okura, et al.

Engineering and Technology

Neural network assisted high-spatial-resolution polarimetry with non-interleaved chiral metasurfaces

C. Chen, X. Xiao, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny