
Linguistics and Languages
Otherness and suspiciousness: a comparative study of public opinions between the Confucius Institute and Goethe-Institut in developing countries
M. Huang
Discover the intriguing findings of Ming Huang's research that explores how Confucius Institutes and Goethe-Instituts are perceived in developing nations. The study reveals striking differences in public perception, identity, and the impact of governmental associations. Uncover the secrets behind the public's views from 2014 to 2023!
~3 min • Beginner • English
Introduction
The study investigates how public opinion in developing countries depicts the Confucius Institute (CI) compared with the Goethe-Institut (GI), addressing gaps in prior research that often lacked comparative designs and relied heavily on qualitative, content-focused discourse analyses. Motivated by challenges faced by CIs in the global communication environment (e.g., spiral of silence, semiotic hegemony, linguistic alliances, and limited international communication capacity), the paper asks how lexical priming features in news discourse differ between CI and GI and how these features influence audience attitudes and institutional images. Using Hoey’s lexical priming theory enhanced with AI methods (word2vec and LDA), the study seeks to clarify: (1) differences in collocation, colligation, semantic association, and semantic prosody between CI and GI coverage; (2) how these lexical features shape audience stance; and (3) how they jointly form public opinion images of CI and GI in developing countries.
Literature Review
The literature on Confucius Institutes spans language teaching, strategic analysis, and cultural communication. Some works emphasize CI’s educational benefits and positive perceptions; others frame CIs as diplomatic tools with political implications, including concerns about academic freedom and propaganda. Public opinion studies range from single-country (notably the U.S., U.K., Spain, Australia, Canada) to global analyses, evolving from qualitative content analysis to discourse-based approaches, including critical discourse analysis and, more recently, lexical priming. Findings are mixed: while many U.S.-centric analyses suggest politicization and negative portrayals (e.g., interference with academic freedom), other studies report neutral or positive views, with topic-dependent shifts toward skepticism when politics/academia are invoked. The review of lexical priming theory (Hoey, 2005) outlines collocation, colligation, semantic association, and semantic prosody, and its extensions to morphology, spoken language, and news discourse. Prior work indicates potential benefits of combining corpus methods with AI (e.g., word2vec) to better capture semantic relations and priming effects. The literature highlights the need for comparative, quantitatively controlled analyses—hence the inclusion of GI as a control, given its mature international image and relatively lower colonial associations in the sampled developing countries.
Methodology
Design and scope: A quantitative, corpus-based comparative study grounded in Hoey’s lexical priming theory. The analysis targets English-language news/media from developing countries to compare CI and GI public opinion images.
Data source and sampling: Data were collected from the NOW corpus for 01/01/2014–04/01/2023 using match strings “Confucius Institute” and “Goethe-Institut.” Because the NOW corpus largely covers English or officially English-speaking contexts, the developing countries sampled include India, Sri Lanka, Pakistan, Bangladesh, Malaysia, the Philippines, South Africa, Nigeria, Ghana, Kenya, Tanzania, and Jamaica. Raw texts were retrieved via URLs and compiled into two custom corpora: CI Developing Country Corpus (CIDC) and Goethe-Institut Developing Country Corpus (GIDC). In NOW, 1086 texts matched CI and 1065 matched GI; the constructed corpora contain CIDC with 16,483 lemma types and 420,605 lemma tokens, and GIDC with 25,834 lemma types and 436,140 lemma tokens.
Preprocessing: URL filtering, de-duplication, removal of non-English characters, tokenization, and stopword handling.
Collocation and semantic association: Two measures were combined: (1) PMI/MI3 to quantify co-occurrence strength; MI3 = log2(J^3 E / B) (as implemented in Wordsmith 8.0) was preferred to emphasize robust co-occurrence, token counts, and distinctiveness; (2) word2vec (Skip-gram and CBOW) to learn word embeddings; cosine similarity identified semantically related terms. Collocations were categorized as specific (high vector similarity only), general (high MI3 only), and typical (high on both). Semantic associations were assessed via MI3 and supplemented by word2vec similarity to capture local/latent semantics.
Colligation: Stanford CoreNLP dependency parsing was applied to sentences containing target terms to quantify voice (active/passive), subject depth, attributes/adjectives, and positional/grammatical patterns (e.g., Confucius_NNP Institute_NNP IN).
Semantic prosody (topics): LDA extracted 10 topics per corpus; Euclidean distance between the target term vector and topic centers identified topics most associated with each institution, treated as semantic prosodies. AntConc was used to inspect typical collocational contexts and categorize them into event/location/relationship/function types.
Analytic outputs: Concordance patterns, MI3-ranked collocates, vector-similar terms, dependency distribution statistics, and LDA topic proximity profiles were integrated to interpret lexical priming features and their implications for audience stance and institutional image.
Key Findings
- Collocational patterns:
  - CI general collocates (high MI3) center on operations and educational functions: language, teachers, teaching, students; institutional terms like university, director, headquarters, and Hanban are prominent. Typical collocates include university, headquarters, held, Karachi, organized, Hanban. Specific collocates group into Chinese partner institutions/provinces (e.g., Hebei, Sichuan, Chongqing) and institutional cooperation/building (e.g., department, faculty, China-built, Lagos). Overlap between CI specific and general collocations is low (~15.5%), indicating macro- vs micro-context divergence.
  - GI collocates emphasize collaboration and international partnership: with, collaboration, partnership, cooperation, plus country/organization names (Bangladesh, Namibia, Chennai; UNESCO, Alliance Française). Typical collocations strongly include partnership, collaboration, cooperation. Overlap between specific and general collocations is high (~42.2%), indicating stable and consistent priming across contexts. Names like Max Mueller Bhavan and personnel (e.g., Kirsten Hackenbroch) appear frequently, reinforcing a facilitative/art-related institutional identity.
- Concordance insights:
  - CI is tightly bound to Chinese/China references across many positions (China, Chinese; university appears in nearly all lexical positions), priming an active, institution-building image.
  - GI repeatedly co-occurs with terms denoting cooperation and with international cultural bodies; German/Germany terms appear but are less tightly bound than Chinese/China for CI.
- Semantic associations (context categories):
  - CI contexts cluster into: Event (establishment-focused), Location (e.g., Karachi, Lagos), Relationship (Hanban, director, university, headquarters), and Function (language teaching, cultural promotion). Reports often praise CI’s role in language/cultural exchange and bridging relations but can imply an “act of state.”
  - GI contexts: Projects across diverse arts/culture fields (kultur, project, auditorium), strong ties to art activities (film, music, exhibitions), relationships spanning places like Namibia/Chennai/Nicosia, and cooperation with non-state cultural organizations (UNESCO, Alliance Française). This breadth can blur explicit “German language” branding while solidifying its identity as an arts/culture facilitator.
- Colligation (dependency/grammar) results:
  - Passive voice predominates for both: GI passive 1387/1521 ≈ 91.19%; CI passive 1640/1880 ≈ 87.23%. CI displays relatively more active framing than GI, reinforcing an “assertive/active” portrayal for CI vs “supportive/collaborative” for GI.
  - Subject depth: GI has more first-level subjects (≈25.4%) vs CI (≈17.08%); CI has more third-level subjects (≈64.2%), suggesting CI discourse emphasizes actions/effects, while GI emphasizes institutional identity.
  - Descriptive complexity: CI shows higher adjective/attribute complexity (three or more layers: CI 170 vs GI 117); patterns like “CI in area/university” (Confucius_NNP Institute_NNP IN) frame CI as part of a university or regional organization.
- Topics and semantic prosody (LDA + vector proximity):
  - CIDC salient prosodies include: education/establishment operations (e.g., Africa, Sri Lanka), sudden events (e.g., Karachi attack), and international relations/policy frames (U.S. politics/government). High-similarity topics (e.g., with chinese, university, mandarin; and with attack, pakistan, karachi) indicate mixed educational and geopolitical/sudden-event salience.
  - GIDC salient prosodies concentrate on arts and cultural exchange: films/film festivals, artists, music/theatre, international festivals/programmes, partnerships/applications. German appears salient but without explicit government linkage; language learning coexists but is not dominant.
- Image-level conclusions from findings:
  - GI’s institutional image is stable, arts-oriented, and collaborative; it is perceived as separate from the German government.
  - CI remains primarily associated with language teaching and cultural dissemination, is closely tied to China/government, and invokes a stronger sense of “otherness.” Negative connotations around Hanban may transfer via collocational priming. Frequent use of quoted speech (said) reflects neutral reporting style but warrants deeper analysis of quoted attitudes.
  - The GI’s macro- and micro-collocational consistency supports a fixed semantic association; CI’s lower overlap indicates a still-evolving public image more sensitive to external frames.
Discussion
The comparative lexical priming analysis answers the research questions by showing that CI and GI are embedded in distinct collocational, colligational, and topical environments that shape audience attitudes differently. CI’s strong ties to China/government, institution-building vocabulary, and occasional linkage to political/sudden events prime audiences to perceive it as active, assertive, and state-associated, which can trigger skepticism and “otherness.” Conversely, GI’s dense priming around collaboration and the arts stabilizes a supportive, creative, and internationally networked image that is decoupled from government control. The grammatical patterns (greater passive voice and shallower subject depth for GI; more descriptive layering and deeper subject embedding for CI) reinforce these narrative roles in news discourse. These findings align with prior literature noting politicization of CI coverage and demonstrate that, in developing countries, while much reporting remains neutral or positive toward CI’s educational activities, its identity is more variable and susceptible to external ideological frames. The results highlight how lexical priming and topic prosody jointly mediate institutional images in public opinion.
Conclusion
Using a corpus-based application of lexical priming theory enhanced with word2vec and LDA, this study shows that the Goethe-Institut’s public image in developing countries is stable and collaboration/arts-oriented, largely separated from the German government, whereas the Confucius Institute remains primarily identified with language teaching and cultural dissemination, closely tied to China/government, and perceived with greater “otherness.” GI exhibits stronger consistency between macro- and micro-collocations (supporting a fixed semantic association), while CI shows lower overlap, indicating an evolving and externally influenceable image. Colligational patterns depict CI as more active/assertive and GI as more supportive/collaborative. Future research should extend computational and corpus-linguistic comparisons of collocation/association measures and deepen analysis of reported speech and quotation contexts to refine understanding of stance and prosody in public opinion discourse.
Limitations
- Geographic scope: Developing-country data are limited to those represented in the NOW corpus (e.g., India, Sri Lanka, Pakistan, Bangladesh, Malaysia, Philippines, South Africa, Nigeria, Ghana, Kenya, Tanzania, Jamaica), which may not generalize to all developing regions.
- Analytical scope: Quoted speech (said) was identified but not systematically analyzed via concordance/quotation processing to unpack stance and topics within quotes.
- Methodological constraints: Collocation/association metrics were restricted to word2vec and MI3-based PMI plus LDA; alternative measures (e.g., Gries’s asymmetric association metrics, tupleization approaches) could yield different top collocates and complementary insights.
Related Publications
Explore these studies to deepen your understanding of the subject.






