Interdisciplinary Studies

Climbing up the ladder of abstraction: how to span the boundaries of knowledge space in the online knowledge market?

H. Cui, T. Li, et al.

Explore the intriguing relationship between knowledge spanning and question appeal in online knowledge markets, unveiled by Haochuan Cui, Tiewei Li, and Cheng-Jun Wang. This study reveals how different levels of knowledge hierarchy can shape the attractiveness of questions on platforms like Zhihu.com.

00:00

~3 min • Beginner • English

Index

Introduction

This study investigates how recombining knowledge across category boundaries (knowledge spanning) affects the appeal of questions in online knowledge markets and how this effect depends on the level of abstraction (knowledge hierarchy). Prior research presents an essential tension: recombination can broaden niches and enhance success but spanning multiple categories can also invite penalties due to audience confusion and reduced niche fitness. The authors propose that questions’ positions in a hierarchical knowledge space moderate the non-linear impact of spanning. They conceptualize question-asking on Q&A platforms as a form of knowledge recombination, argue that knowledge is hierarchically structured, and suggest that moving up the ladder of abstraction can loosen the tradeoff between novelty and comprehensibility. They focus on free knowledge markets (e.g., Zhihu/Quora), where reputation and attention—not direct payments—govern participation, making them suitable contexts to study the appeal of questions as a function of knowledge spanning and hierarchy. The study advances three hypotheses: H1, an inverted U-shaped relationship between knowledge spanning and question appeal; H2, knowledge hierarchy influences appeal; and H3, knowledge hierarchy moderates the inverted U-shape, weakening it as abstraction increases.

Literature Review

Theories of knowledge recombination (Schumpeter; Nelson & Winter) and Koestler’s ‘bisociation’ posit that novelty often arises from combining disparate ideas. Network perspectives (small-world networks; Uzzi & Spiro) highlight tradeoffs between local cohesion and long-range ties for idea diffusion. Bourdieu’s field theory frames strategic choices in research within habitus and capital, emphasizing an inherent tension between risky innovation and productive tradition (Kuhn’s ‘essential tension’). In cultural markets, category spanning can suffer penalties due to audience confusion and reduced niche fitness, but optimal differentiation can outperform typicality, producing inverted U-shaped effects between typicality/spanning and success. In Q&A and knowledge networks, similar inverted U-shapes emerge for tag distances and popularity. Knowledge is also hierarchical (Cole), and language expresses a ladder of abstraction (Hayakawa), suggesting that higher-level, coarse-grained categories are broader but fuzzier. Category fuzziness and contrast (Kovács & Hannan) moderate spanning penalties: fuzzy, high-level categories can reduce confusion, potentially mitigating penalties. Thus: H1 posits an inverted U-shaped relation between spanning and appeal; H2 posits that abstraction level affects appeal; H3 posits that abstraction moderates the inverted U, making it less pronounced at higher abstraction levels.

Methodology

Data source and scope: Data were collected from Zhihu.com, China’s largest Q&A site, covering questions asked from December 2010 to May 2018. The dataset includes question id, number of answers, number of followers, question text, and assigned categories. The paper reports an overall dataset size of N = 463,545; a descriptive subset mentions 312,053 questions; and the regression analyses use 404,577 observations. Zhihu maintains an official knowledge tree with categories organized as a directed acyclic graph (DAG); total categories N = 108,432. Measures: (1) Knowledge spanning (KS): Construct a knowledge space using Word2Vec skip-gram with negative sampling. Treat each question’s category list as a ‘sentence’ and categories as ‘words.’ Use window size = 5 (matching the 99th percentile of category list length), embedding dimension K = 50. Train on all category co-occurrences to obtain category vectors. For each question, compute KS as the average pairwise distance 1 − cosine similarity across all assigned category vectors (KS=0 if only one category). Due to skewness, log-transform KS. (2) Knowledge hierarchy: Using the Zhihu category DAG, compute each category’s depth from the root (lowest level set to 0). A question’s hierarchy equals the average depth of its categories; higher values indicate more abstract (higher-level) categories. (3) Appeal of questions: Measure appeal via the number of followers of the question; apply log transform due to heavy skew. Controls: title length (characters), question lasting days (log), and day-of-week indicators. Analysis: Fit multiple linear regression models predicting log followers from log(KS) and [log(KS)]^2 to test the inverted U-shape (H1), include main effect of hierarchy (H2), and include interactions log(KS)×hierarchy and [log(KS)]^2×hierarchy to test moderation (H3). Visualize nonlinearity and moderation with partial dependence plots and category-specific curves. Dimensionality reduction (t-SNE) is used only for visualizing the knowledge space.

Key Findings

- H1 (inverted U-shaped effect of knowledge spanning): Supported. The quadratic term of log(KS) is negative and highly significant, indicating a parabolic relationship between knowledge spanning and appeal. From Fig. 4, appeal increases as log(KS) rises up to roughly −0.6, then declines as log(KS) exceeds about 0.6. The inverted U-shape appears across multiple top categories (life, history, movie, psychology, love, education, society, medicine, law, literature, internet, health, fitness, study abroad). - H2 (main effect of knowledge hierarchy): Supported. Knowledge hierarchy has a positive and significant association with appeal (e.g., B ≈ 0.01–0.016, p<0.001 across models), indicating that more abstract questions attract more followers. - H3 (moderation by knowledge hierarchy): Supported. Significant interactions between hierarchy and both the linear and quadratic log(KS) terms show that increasing hierarchy weakens the inverted U shape. Visualization (Fig. 5) shows that when hierarchy < ~6 (more concrete), the inverted U is pronounced; when hierarchy > ~6 (more abstract), the curve flattens and the inverted U largely disappears. - Model performance: Across 404,577 observations, models yield R^2 ≈ 0.389–0.390. Controls behave as expected: title length negatively associated with appeal; lasting days (log) strongly positive; minor day-of-week effects. - Descriptives (Table 1): Appeal (log) mean 4.018 (SD 2.224); knowledge spanning (log) mean −1.450 (SD 0.918); hierarchy mean 5.397 (SD 1.801); title length mean 21.58 (SD 10.186); lasting days (log) mean 5.439 (SD 1.697).

Discussion

The findings demonstrate a tradeoff between similarity and differentiation in question framing: moderate knowledge spanning maximizes appeal, while too little spanning lacks novelty and too much spanning induces audience confusion. Crucially, this tradeoff depends on abstraction: moving up the ladder of abstraction reduces the penalties associated with spanning, likely because high-level categories are fuzzier and audiences apply less rigid expectations, diminishing confusion. Consequently, in free online knowledge markets, where many questions are relatively abstract on average, negative feedback against recombination is minimal and the environment is more tolerant of category spanning than tightly policed scientific fields. When spanning is below a threshold, concrete questions can be preferred; as spanning grows, abstract framing helps maintain appeal by easing interpretability and broadening audiences. These results align with category contrast and heterogeneity mechanisms and extend prior inverted U-shaped findings by identifying knowledge hierarchy as a key contextual moderator.

Conclusion

This study conceptualizes question-asking as knowledge spanning and shows that: (1) knowledge spanning has an inverted U-shaped relationship with question appeal; (2) knowledge hierarchy positively affects appeal; and (3) higher abstraction attenuates and can eliminate the inverted U-shaped penalty of spanning. Methodologically, the paper demonstrates how geometric (word embeddings) and network (hierarchy) representations can model knowledge spaces and distances. Practically, maintaining and leveraging a hierarchical category tree can encourage creative, boundary-spanning questions by enabling abstract framing. Future research directions include: integrating hierarchy and similarity into unified models (e.g., hyperbolic embeddings), examining knowledge spanning in answers in addition to questions, testing in fee-based knowledge markets and other platforms, and exploring how user attributes and social dynamics co-evolve with knowledge-space positioning.

Limitations

- Scope: The analysis focuses on question-asking; it does not examine knowledge spanning in answers. - Measurement space: Similarity-based spanning (embeddings) and hierarchy-based spanning (network depth) are modeled separately; a unified representation is not implemented. - Context: Findings are from a free knowledge market (Zhihu); generalizability to fee-based systems or other platforms requires testing. - Conceptual premise: The study conceptualizes creativity primarily as recombination; other mechanisms (deduction, intuition, discovery) are not modeled. - Coverage: Reliance on existing category systems may underrepresent infrequent or emerging topics not well captured by the taxonomy.

Related Publications

Explore these studies to deepen your understanding of the subject.

Education

Emergent transition from face-to-face to online learning in a South African University in the context of the Coronavirus pandemic

C. B. Mpungose

Business

Evidence of the time-varying impacts of the COVID-19 pandemic on online search activities relating to shopping products in South Korea

J. Song, K. Jung, et al.

Medicine and Health

The 2022 Massive Open Online Course (MOOC) to train physiotherapists in the management of people with spinal cord injuries: a qualitative and quantitative analysis of learners’ experiences and its impact

J. V. Glinsky, J. Ilha, et al.

Political Science

How to convince in a televised debate: the application of machine learning to analyze why viewers changed their winner perception during the 2021 German chancellor discussion

F. Ettensperger, T. Waldvogel, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny