Linguistics and Languages
Bibliometric analysis of Asian 'language and linguistics' research: A case of 13 countries
D. Lee
The study investigates how regional characteristics shape 'language and linguistics' research in Asia, where linguistic diversity and sociocultural heterogeneity are pronounced. Prior bibliometric work has often targeted narrow sub-topics, leaving a gap in understanding region-wide trends. Given that language reflects national culture and identity, national contexts likely drive differing research emphases across countries. This paper analyzes research produced between 2000 and 2021 across 13 Asian countries to answer: (1) What is the geographic distribution and relative contribution of each country? (2) How do authorship and collaboration patterns differ? (3) What topics are most studied and how have they evolved? (4) What is the scope and intensity of each country’s research impact? The study focuses on productivity, authorship/collaboration, top keywords, and impact to inform institutions and scholars about performance, strategic publishing venues, and benchmarking.
Existing bibliometric analyses in language and linguistics typically focus on sub-domains: children’s language, computational linguistics/NLP, discourse analysis, ELT, linguistic landscape, second language acquisition, second language writing, and vocabulary acquisition. Large-scale efforts (e.g., Guo, 2022 on child language; Radev et al., 2016 on computational linguistics) provide comprehensive mappings but lack regional comparative analysis, especially for Asia, or emphasize conference proceedings and computer science overlaps. Regional or country-level studies (e.g., Lei and Liao, 2017 on four Chinese-speaking regions; Barrot, 2017; Ngoc and Barrot, 2022 on Southeast Asia; Mohsen, 2021 on Saudi applied linguistics) are limited by scope, sample size, or regional activity levels and do not compare multiple Asian countries comprehensively. Nederhof (2011) contrasted domestic vs. international audiences in the Netherlands but used older, partly non-academic samples. Overall, despite many bibliometric studies, few assess Asian regional characteristics at scale across diverse topics, motivating the present comprehensive, multi-perspective analysis across 13 Asian countries.
Scope and country selection: The study targets 13 Asian countries—China, Hong Kong, India, Indonesia, Iran, Israel, Japan, Malaysia, Saudi Arabia, Singapore, South Korea, Taiwan, and Turkey—selected for academic advancement and/or R&D investment. Most were chosen because their R&D spending exceeded 0.5% of GDP (per Meo et al., 2013), with adjustments: Qatar excluded due to low output; Taiwan included due to significant output despite unclear R&D ratio; Indonesia and Saudi Arabia included for rapid recent growth. Data source and search strategy: Elsevier’s Scopus was used. Advanced search: AFFILCOUNTRY(country) AND ALL(language AND linguistics). Results were restricted to Social Sciences and Arts and Humanities subject areas. Only journal articles were included to align with HSS evaluation norms. Time window: Publications from 2000–2021 were included. For journals intermittently indexed in Scopus, only years during indexation were sampled to ensure quality. Predatory journals (per Beall’s list) were excluded: from an initial 32,379 articles in 2,380 journals, 1,864 articles across 31 journals were removed, yielding 30,515 articles from 2,349 journals. Citation data: Citations were collected from Scopus using a five-year citation window (common in several disciplines) to normalize across publication years. Because full five-year windows were not available for 2016–2021 at data collection (Feb 2022), impact analyses used articles published 2000–2015 (n=11,329). Metadata extraction: Using the Scopus API, titles, abstracts, journal titles, years, pages, author keywords, etc., were collected. Author identifiers, names, affiliations, and countries were used to analyze authorship. Country attribution: Following Shen et al. (2018), a target article’s country was determined by the first author’s affiliation. If the first author was non-Asian, the corresponding author’s country was used (collected manually where needed). If neither first nor corresponding author was Asian, the first Asian author in the byline was used. For multi-affiliations, the Asian affiliation was prioritized. Journal classification: Articles were classified as “international” if published in journals indexed in WoS core databases (SSCI, SCI-E, A&HCI) during the relevant period; otherwise, “regional.” Articles in journals indexed later in WoS core databases were treated as regional for pre-indexation years. Keyword processing: Author-defined keywords were available for 27,214 articles. For 3,301 articles without keywords, KeyBERT was applied to titles/abstracts to extract up to five top-scoring keyphrases; 157 articles lacked sufficient text. Collaboration and citation analyses: International collaboration networks were analyzed (including modularity-based community detection). Self-citations were identified at author- and country-levels; international citations excluded both forms of self-citation to assess external impact.
- Coverage and productivity
- Across 41 Asian countries (2000–2021), 35,830 language and linguistics articles were published; the 13 target countries produced 85.2% (n=30,515).
- Annual output grew from 212 articles (2000) to 6,290 (2021), averaging ~16.9% annual growth.
- Most prolific among the 13: China (5,828), Japan (3,301), Iran (3,456), Hong Kong (2,682), Taiwan (2,483), South Korea (2,121); others included Malaysia (2,061), Israel (2,057), Turkey (1,888), Indonesia (1,637), Singapore (1,128), India (972), Saudi Arabia (901).
- Highest average annual productivity growth: Indonesia 73.2% (SD 286.9), Iran 54.1% (SD 106.2), Saudi Arabia 42.9% (SD 81.6), Malaysia 39.1% (SD 86.1). China’s growth averaged 29.9% (SD 29.7). China became most prolific from 2010 onward, overtaking Japan.
- Publication venues (international vs. regional)
- Overall, 46.0% (n=14,045) of the 30,515 articles appeared in international journals (WoS core indexed).
- Countries prioritizing international journals: China, Hong Kong, Israel, Singapore, South Korea, Taiwan. Countries emphasizing regional journals: Indonesia, Iran, Malaysia, Saudi Arabia.
- Growth patterns confirmed: China, Hong Kong, Israel, Singapore, Taiwan’s productivity closely tracked international publications; Indonesia, Iran, Malaysia, Saudi Arabia’s surges were driven by regional journals.
- Top outlets included System, Journal of Pragmatics, Lingua, Information Processing & Management, Language Teaching Research, Computer Assisted Language Learning, and prominent regional ELT/linguistics journals.
- Authorship and collaboration
- 39,929 distinct scholars authored 30,515 articles; mean authors per article = 2.3 (SD 1.7). Mean articles per author = 1.8 (SD 2.4).
- Authorship distribution: sole authorship 35.7% (10,904), two authors 31.1% (9,501), three authors 17.3% (5,287); 84.2% had ≤3 authors.
- Trend: sole authorship fell from ~55–60% (early 2000s) to ~30% (2020s); multi-authorship rose from ~40–45% to ~70%. Two-author papers recently most common.
- Of 19,611 co-authored papers, 64.1% (12,574) were domestic collaborations; 35.9% (7,037) international (majority between two countries).
- Country patterns: Hong Kong, Japan, Taiwan had high sole authorship yet, when collaborating, favored international ties. Indonesia, Iran, Malaysia had predominantly domestic collaborations and regional publishing. Israel, South Korea, Taiwan often published internationally with strong domestic collaborations, reflecting sufficient domestic research capacity. Saudi Arabia had the highest sole authorship and more international than domestic collaborations despite publishing mostly in regional journals.
- International collaborators most frequent: United States, United Kingdom, Australia, Canada, The Netherlands. Within Asia, collaborations among the 13 were active; top regional collaborators included China, Hong Kong, and Iran.
- Collaboration communities (modularity): (1) Europe-centric group including India, Israel, Turkey (avg weighted degree 532.6); (2) East Asia + US/Oceania including China, Hong Kong, Japan, Singapore, South Korea, Taiwan, Iran (avg weighted degree 1972.7); (3) Indonesia, Malaysia, Saudi Arabia with Gulf/South Asian ties (avg weighted degree 280.0).
- Topics and trends
- Keyword corpus combined author keywords and KeyBERT extraction (30,358 articles). Top topics: Asian languages (Chinese, Japanese, Hebrew, Cantonese), English and ELT (EFL, EFL learners, English), discourse-focused areas (conversation analysis, critical discourse analysis, discourse analysis), language education (second language acquisition, language learning, higher education), and computational topics (sentiment analysis).
- Temporal shift: Pre-2010, Asian languages and core linguistics (morphology, phonology, pragmatics, syntax) dominated. From ~2010 onward—especially post-2014—English-related and discourse topics surged.
- Computational and online learning keywords (deep learning, NLP, social media, sentiment/opinion mining, text mining, Twitter, speech recognition; E-learning, interactive/online learning) increased markedly in recent years; particularly prevalent in India, China, and South Korea. COVID-19-related online learning topics also emerged.
- Country-specific popular topics: English-related themes prevalent across countries (except India’s strong computational focus); localized English and language policy in Hong Kong/Singapore; national languages/dialects prevalent in each country (e.g., Cantonese/Chinese/Mandarin; Bengali/Hindi; Persian; Hebrew/Arabic; Japanese; Malay; Korean; Turkish). Israeli research reflected the Russian-speaking population.
- Impact and citations (5-year window; 2000–2015 articles, n=11,329)
- 78.4% (8,880) were cited at least once within five years; among cited items, mean citations ≈ 8.3 (SD 12.2). Articles with ≥19 citations were outliers.
- Highest average citations per article: Israel m=9.1 (SD 16.6), Singapore m=9.1 (SD 13.0), Hong Kong m=7.9 (SD 9.8). Lowest: Indonesia m=2.1 (SD 3.7; 47.5% uncited), Iran m=2.7 (SD 4.6; 33.3% uncited), Malaysia m=3.7 (SD 4.9; 24.7% uncited).
- Total citations to cited articles within five years: 73,688. Author-level self-citations: 18.8% (13,853) across 4,747 articles; country-level self-citations: 13.3% (9,832) across 3,939 articles. Highest author-level self-citation ratios: India, Israel, Malaysia; lowest: Iran, Turkey, Taiwan. Highest country-level self-citation ratios: Iran, Malaysia, Indonesia.
- After removing both self-citation types (and excluding 176 uncategorized), 67.9% (50,003) of citations were international. By ratio of international citations to total, Saudi Arabia, Singapore, South Korea, Turkey (followed by Japan and Hong Kong) showed relatively strong international impact. In raw counts, Japan, China, Hong Kong, Israel were highest.
- The United States was the top citer of every country’s output (e.g., citing Japan n=2,321; Israel n=1,848; Hong Kong n=1,655; China n=1,486). North America accounted for 28.1% of international citations; strong citing also from Eastern Asia, Northern/Western Europe. Intra-Asian citations were common, especially among Chinese-speaking regions.
The findings address the study’s core questions. First, productivity analyses show that 13 countries decisively dominate Asian language and linguistics output (85% of 2000–2021 publications), establishing their representativeness for regional trend analysis and revealing rapid field growth (~17% annually). Second, authorship/collaboration patterns demonstrate a regional shift toward co-authorship and sustained internationalization, with nuanced country differences: internationally oriented systems (e.g., Hong Kong, Singapore, Taiwan, Israel, China, South Korea) channel output into WoS-indexed journals, whereas Indonesia, Iran, Malaysia, Saudi Arabia have expanded regionally with strong domestic collaboration. Third, topic analyses reveal a temporal reorientation from national languages and core linguistic subfields to English- and discourse-related research since 2010, aligning with globalization and English’s status as lingua franca. The concurrent surge in computational/AI-enabled language analyses indicates deepening interdisciplinarity and responsiveness to technological advances. Fourth, impact analysis shows heterogeneity in citation performance: Hong Kong, Israel, Singapore publish highly cited work; countries emphasizing regional outlets exhibit lower citation averages and higher uncited rates. International citations predominantly come from the US and Europe, but intra-Asian influence is substantive among East Asian countries. Together, these results underscore how economic, policy, and publication venue choices shape visibility and impact, offering benchmarks for strategic research development and collaboration planning across Asia.
This comprehensive bibliometric study of 30,515 articles (2000–2021) across 13 Asian countries shows that these nations overwhelmingly lead Asian language and linguistics research. China has risen to the forefront since 2010; Japan and Hong Kong have maintained strong, steady contributions. Publication strategies bifurcate between international core journals and regional outlets, influencing visibility and impact. Authorship has shifted toward collaboration, with international networks centered on the US, UK, Australia, and active intra-Asian ties. Topical emphases have evolved: national languages and core subfields were prominent pre-2010; English- and discourse-related topics have surged since, alongside rapid growth in computational/AI-driven language analysis and online learning themes. Future research directions include: (1) topic modeling and keyword network analyses to identify cross-country thematic clusters; (2) investigating determinants of research impact in language and linguistics (beyond generic SSH bibliometrics); (3) deeper examination of domestic citation dynamics (e.g., institutional or prior-collaboration effects); (4) dedicated assessments of the impact of computational language analysis topics within Asian language and linguistics as citation data mature.
- Source coverage and indexing: Reliance on Scopus (chosen for superior SSH and Asian coverage) may omit content indexed elsewhere; journal classification (international vs. regional) depends on WoS core indexing timelines.
- Predatory journal exclusion: Use of Beall’s list may imperfectly classify journals; exclusion could remove some legitimate content or retain borderline cases.
- Country attribution: Assigning country by first/corresponding author (with manual steps) and prioritizing Asian affiliations in multi-affiliations may introduce classification bias in multinational teams.
- Citation window: Five-year windows exclude 2016–2021 articles, potentially underrepresenting recent impact and fast-moving subfields.
- Keywords: For 3,301 articles, keywords were machine-extracted (KeyBERT); semantic nuances and cross-language issues may affect topic granularity; 157 articles lacked sufficient text for keyword extraction.
- Network analyses: Modularity-based community detection and centrality measures are sensitive to data completeness and counting conventions (e.g., multiple counting of multi-country coauthorship and citations).
- Generalizability: Focus on 13 prolific countries; although they account for 85% of output, results may not fully generalize to less prolific Asian countries.
Related Publications
Explore these studies to deepen your understanding of the subject.

