Introduction
The exponential growth of global scientific output necessitates a deeper understanding of the science of science, particularly how scientific fields emerge and expand. While scientific productivity has decreased, innovation often stems from converging technologies, heavily reliant on interdisciplinary inputs. This study addresses the challenge of identifying emerging research topics that cross scientific fields, particularly those less path-dependent than established ones. Existing methods often focus on local maps or predefined fields, leading to limitations in evaluating emerging topics due to canonical bias. The researchers argue for a novel approach that moves beyond frequency-based measures and considers the influence of interdisciplinary interactions in identifying emerging value and predicting future innovation trajectories. The goal is to analyze the complexity of scientific knowledge production and to anticipate scientific innovations as they emerge from converging areas of research. The primary objective is to introduce a new bibliometric analysis approach combining network analysis and embedded topic modeling (BERTopic) to identify emergent scientific topics characterized by interdisciplinarity. A new measure for emergent topics, using the network centrality index, is developed and used in conjunction with BERTopic to gain insights into emergent, globally domain-crossing profiles within interdisciplinary science fields.
Literature Review
Existing studies on scientific emergence have evolved from citation analysis to methods combining network analysis and topic modeling. Network analysis, often using citation links, maps trends and patterns in scientific literature, revealing seminal discoveries that alter the course of a scientific specialization. Science mapping, through citation analysis, can demonstrate scientific development stages and identify transformative contributions. Science overlay maps represent subsets of global base maps to distinguish different research field categorizations. Emerging technologies can be defined by bibliometric indicators and text analysis, with methods such as clustering, national output analysis, and network analysis used to model emergence. However, many studies focus on local maps or predefined areas, limiting their scope. Existing approaches often rely on frequency-based topic modeling, potentially leading to canonical bias. The novelty of interdisciplinary research has been modeled via keyword co-occurrence, but existing studies often limit the analysis to local science maps and relative measures of emergence. Global science maps, preserving the entire context, are argued to provide more accurate partitions and higher textual coherence of topics. While some research differentiates between multi-, inter-, and trans-disciplinary approaches, their operationalization remains limited. This study distinguishes between growing and dominant sciences by focusing on interdisciplinarity across STEM domains.
Methodology
This study employs a two-stage process (see Figure 1). Stage 1 involves network analysis of an interdisciplinary science dataset derived from the Web of Science. First, a science category-subject co-occurrence pair set is constructed from publication metadata. Publications are classified as interdisciplinary if they list at least two Web of Science categories (Life-science & Biomedicine, Technology, Physical sciences). The dataset is divided into three-year periods (2012-2014 and 2015-2017) to stabilize rankings. Then, a co-occurrence network is created, with science category-subjects as nodes and publications as edges. Eigenvector centrality (EIG) is calculated for each node, measuring its influence in the network. The top 10% of science category-subjects by EIG and EIG growth rate (EIG.GR) are classified as dominant and growing sciences, respectively. Publications involving growing-sciences are filtered for Stage 2. Stage 2 uses embedded topic modeling with BERTopic. BERTopic leverages BERT embeddings to capture semantic information in text data, UMAP for dimensionality reduction, HDBSCAN for clustering, and c-TF-IDF for topic generation (see Figure 4). Hyperparameter tuning (n-gram range, number of topics, minimum topic size) is conducted to optimize BERTopic performance, aiming to minimize information entropy to ensure clear topic distinction. Finally, qualitative validation examines the top representative articles for each identified topic to verify the coherence and interpretability of the results. This includes checking if keywords define common themes and evaluating the journals where the most representative articles are published. This validation ensures the rationality and understanding of non-experts. This is an iterative process, as presented in Figure 1. The datasets are presented in Table 1, the network is presented in Figure 5 and the BERTopic process is presented in Figure 4. Hyperparameter analysis is presented in Table 3.
Key Findings
The network analysis revealed a clear distinction between dominant and growing interdisciplinary science fields (Figure 5 and Table 2). Growing interdisciplinary science category-subjects showed significantly higher Eigenvector centrality in the subsequent period compared to other fields (Figure 6), indicating a consistent growth trend. The BERTopic analysis, after hyperparameter tuning (Table 3), identified several key topics (Table 4) that emerge in different interdisciplinary science fields. Table 5 presents representative articles for each emergent topic, showing that the keywords and journal titles reflect the defined topics, validating the results and confirming their interpretability for non-experts. The distribution of publications with emergent interdisciplinary topics is skewed toward a small number of journals (Table 6), with approximately half the publications concentrated in the top quintile of journals for each interdisciplinary category. In particular, the emergence of green technologies and health-related topics was prominent across various interdisciplinary categories. The analysis highlights the specific combination of existing subjects that contributed to emergent topics.
Discussion
This study provides an alternative perspective on the science of science emergence, focusing on the influence of changing boundaries across scientific categories rather than on breakthroughs or frequency-based dominant topics within specific fields. The use of Eigenvector centrality as a measure of influence for emergent topics complements existing frequency-based approaches and helps to capture the significance of interdisciplinary interactions. The identification of green- and health-related topics as emergent across various interdisciplinary categories is relevant to contemporary global challenges, potentially informing research funding and policy initiatives. The methodology effectively combines network analysis and embedded topic modeling, offering a robust approach for identifying meaningful emergent topics in a large dataset.
Conclusion
This study makes significant contributions by expanding the definition of interdisciplinarity to a global domain-crossing level, using Eigenvector centrality to measure the influence of emergent topics, and applying embedded topic modeling to a global science map. The findings indicate that green and health-related topics are key drivers of emerging interdisciplinary science. However, limitations include the relatively small number of automatically generated topics and the reliance on Web of Science data, potentially excluding other types of publications and innovation sources. Future research could refine topic definition, reduce computational demands, and incorporate a broader range of research outputs and impact measures.
Limitations
The study’s limitations include the small number of automatically generated topics, suggesting potential for further emergent topics to be identified with refined approaches. The focus on scientific journal articles in the Web of Science could exclude innovations not originating in science and technology fields or disciplines that use different publication types. Furthermore, the method requires substantial computing power, limiting widespread applicability. Future research should consider more comprehensive measures of research productivity and impacts, particularly concerning socially oriented innovation.
Related Publications
Explore these studies to deepen your understanding of the subject.