logo
ResearchBunny Logo
The effect of data element agglomeration on green innovation vitality in China

Business

The effect of data element agglomeration on green innovation vitality in China

D. Han, H. Wu, et al.

This study explores the impressive impact of data element agglomeration on green innovation vitality in China, revealing that stronger government support and public environmental concern enhance these effects. Conducted by Dongri Han, Hongshuang Wu, and Ke Lu from Shandong University of Technology, the findings show a notable spatial distribution in innovation benefits across provinces. Discover how this research can guide efforts to foster green innovation!

00:00
00:00
~3 min • Beginner • English
Introduction
The study investigates whether and how data element agglomeration (DATA)—the concentration and integration of data resources within the digital economy—enhances regional green innovation vitality (GIV) in China. Against the backdrop of the dual‑carbon goals and innovation-driven development, China has achieved strong growth in green patents but faces issues of low industrialization rates and quality gaps. GIV, characterized by high risk, high investment, and long cycles, remains insufficiently stimulated due to financing constraints and short-termism. The paper posits that DATA may act as a new production factor with non-exclusivity, replicability, and spillovers that can directly and indirectly drive GIV. It formulates three hypotheses: H1, DATA directly enhances GIV; H2, DATA indirectly improves GIV via government support (GS) and public environmental concern (PEC); and H3, DATA’s effect on GIV is nonlinear with threshold effects depending on GS and PEC. The purpose is to provide theoretical and empirical evidence on DATA’s role in green innovation and to inform policy for coordinated, high-quality green development.
Literature Review
The literature identifies unique properties of data elements—liquidity, permeability, spillovers, and capacity to transcend spatial constraints—shaping industrial cooperation and coordinated development. In platform economies, data is central to firm value creation, potentially leading to DATA, market power, and data monopolies as large platforms out-invest SMEs. DATA can improve innovation efficiency by enabling deeper insight into customer needs and competitive environments. Research on GIV divides into antecedents (e.g., environmental assessments, regulation stringency, local government incentives) and consequences (e.g., environmental governance and structural upgrading), with evidence of heterogeneous threshold effects. Gaps remain: no standard metric for DATA; limited work integrating DATA and GIV during the digital-sustainability convergence; insufficient empirical evidence on their relationship and nonlinearities. This study contributes by: (1) examining multidimensional effects of DATA on GIV; (2) analyzing mechanisms at macro (GS) and micro (PEC) levels; and (3) jointly modeling DATA and GIV with attention to regional heterogeneity and threshold nonlinearity.
Methodology
Design: Provincial panel study of 30 Chinese provinces (excluding Hong Kong, Macao, Taiwan, Tibet) from 2011–2021 using fixed-effects baseline regressions, mediating effect models, and dynamic threshold regressions estimated via system GMM to address endogeneity and dynamics. Models: - Baseline FE regression: GIV_it = α + α1 DATA_it + α2 X_it + λ_i + ε_it, where X includes science and technology human capital (STHC), industrial structure (IS), marketization degree (MD), and unemployment degree (UD). - Mediation models: mediation_it = β0 + β1 DATA_it + β2 X_it + λ_i + ε_it; GIV_it = ω0 + ω1 DATA_it + ω2 mediation_it + X_it + λ_i + ε_it, with mediators GS and PEC. Sobel and bootstrap tests assess mediation significance. - Dynamic threshold regression (system GMM): GIV_it = μ + ρ GIV_{i,t−1} + θ1 DATA·I(threshold ≤ y) + θ2 DATA·I(threshold > y) + θ3 X_it + ε_it, where threshold variables are GS and PEC. Threshold significance tested via bootstrap (Hansen 1999), LR functions used to identify threshold values. Variables and measurement: - Explained variable (GIV): log(number of patent applications for green inventions), based on IPC Green Inventory (Dong & Bai, 2024). - Core explanatory variable (DATA): multi-dimensional index reflecting data element agglomeration using internet broadband access, number of enterprise-owned websites, and number of e-commerce trading enterprises; aggregated via projection pursuit (RAGA-PP) based on an accelerated genetic algorithm (Chao et al., 2020). - Mediators/thresholds: GS (regional government financial expenditure) and PEC (frequency of public environmental reports). - Controls: STHC (full-time equivalent of R&D personnel), IS (tertiary/secondary output value ratio), MD (budget expenditure/GDP), UD (urban registered unemployment rate). Some robustness includes Urbanization. Data handling: Data sourced from China Statistical Yearbook and the National Bureau of Statistics. Variables are log-transformed where appropriate; missing values imputed via linear interpolation. Spatial-temporal patterns of DATA depicted for 2011, 2016, 2021. Model selection and diagnostics: Hausman supports FE for the baseline. For dynamic threshold GMM, AR(1)/AR(2) and Hansen tests assess serial correlation and instrument validity.
Key Findings
- Spatial-temporal evolution of DATA: DATA increased nationwide from 2011 to 2021 with marked spatial heterogeneity—generally higher in the east and lower in the west. Guangdong leads; Hainan lags. Growth rates are higher in the west, narrowing gaps; Sichuan ranks among the top five nationwide. - Baseline regression (Table 2): DATA positively and significantly drives GIV nationally (coef. 5.850, p<0.01). Regional heterogeneity exhibits “central > western > eastern”: central (7.117***), western (5.983***), eastern (5.369***). Control effects nationally: STHC positive (0.018***), IS positive (0.322***), MD positive (2.114***), UD negative (−14.290***). - Mediation (Table 3): DATA strengthens GS and PEC, which in turn raise GIV. GS path: DATA → GS (3.527***), GS → GIV (0.234**); indirect effect ≈ 0.825 (3.527×0.234), 14.103% of total. PEC path: DATA → PEC (17.565***), PEC → GIV (0.036**); indirect effect ≈ 0.632 (17.565×0.036), 10.803% of total. Sobel and bootstrap tests support both mediations (p<0.05 or marginal in bootstrap for GS). - Threshold nonlinearity (Tables 4–5; Fig. 3): Single thresholds significant for both GS and PEC; estimated thresholds: GS = 8.7309, PEC = 7.1405. Dynamic GMM with lagged GIV significant (0.739*** to 0.803***); instruments valid (Hansen p>0.1; AR(1) p≈0.04–0.05; AR(2) p≈0.15–0.18). Above thresholds, DATA’s positive impact on GIV is stronger: GS regime coefficients 1.120*** (≤8.7309) vs 1.137*** (>8.7309); PEC regime 0.524* (≤7.1405) vs 0.755** (>7.1405). - Robustness (Table 6): Results persist when using lagged GIV as DV, adding Urbanization, and dropping UD. DATA remains positive and significant across checks. Overall: Evidence supports H1 (direct positive effect), H2 (indirect effects via GS and PEC), and H3 (nonlinear threshold effects with stronger impacts when GS and PEC exceed thresholds).
Discussion
The findings demonstrate that agglomeration of data elements serves as a substantive driver of regional green innovation vitality in China. DATA directly enhances GIV by improving information flows, enabling knowledge sharing, and catalyzing complementary resource integration within innovation ecosystems. Indirectly, DATA amplifies GIV by reinforcing government support (e.g., fiscal capacity and policy tools that fund green R&D) and elevating public environmental concern (strengthening transparency, oversight, and market demand for green solutions). The observed regional heterogeneity—stronger marginal effects in central and western regions—suggests diminishing returns in already data- and innovation-rich eastern provinces, where innovation networks and technological resources are more mature. The threshold evidence shows that a conducive institutional and social environment (sufficient GS and high PEC) is critical for unlocking DATA’s full potential, aligning with the notion that data-driven innovation requires complementary public investment, governance, and citizen engagement. These results address the research questions by confirming both the direct and mediated channels and by quantifying conditions (thresholds) under which DATA’s effects intensify, offering actionable guidance for regionally tailored green development policies.
Conclusion
This study provides comprehensive empirical evidence that data element agglomeration significantly promotes green innovation vitality across Chinese provinces and that its impact is mediated by government support and public environmental concern, with notable nonlinear threshold effects. Key contributions include: (1) establishing the multidimensional role of DATA (direct, mediated, and nonlinear) in driving GIV; (2) integrating macro-level (GS) and micro-level (PEC) mechanisms; and (3) accounting for regional heterogeneity, revealing a gradient of effects (“central > western > eastern”). Policy implications emphasize coordinated regional strategies to expand DATA infrastructure and governance, bolster fiscal support for green R&D, and enhance public environmental engagement to surpass effective thresholds. Future research directions proposed by the authors include: (a) moving from provincial panels to finer-grained urban and industry-level analyses using micro-data; (b) expanding the mechanism framework to include additional mediators/moderators (e.g., resource mismatch); and (c) leveraging advances in digital technologies (AI, machine learning, big data) to refine DATA metrics and deepen causal inference regarding its impact on GIV.
Limitations
- Scope and granularity: The sample comprises 30 provinces (2011–2021), which may mask intra-provincial and industry-specific heterogeneity. Future work should use city-level and firm-level data and conduct sector-specific analyses. - Mechanism coverage: While GS and PEC are examined, other mediators/moderators (e.g., resource misallocation, financial market development, institutional quality) may influence the DATA–GIV nexus and warrant inclusion. - Measurement and technology dynamics: DATA is measured via a composite index (RAGA-PP) using available proxies; evolving digital technologies (AI/ML, big data) could improve measurement accuracy and causal identification. Incorporating such tools and richer data sources is a priority for future studies.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny