Sociology
A Chinese Tale of Three Regions: A Century of China in Thousands of Films
Z. Chen, W. Ma, et al.
The study examines how films from mainland China, Taiwan, and Hong Kong portray the image of China, and how these portrayals have evolved alongside the regions’ divergent historical, political, and cultural trajectories. The authors argue that film both reflects and shapes national images for domestic and international audiences, especially through the selection and presentation of Chinese films on Western platforms like IMDb. Addressing limitations in traditional case-based film scholarship, the study poses a macroscopic research question: how have content, topics, and sentiments about “China” differed across the three regions over time, and what sociopolitical factors explain these differences? The purpose is to use large-scale computational text analysis to provide a more objective, comprehensive account of China’s image in global cultural circulation, grounded in postcolonial theory and agenda-setting perspectives.
The background situates Chinese-language cinema beyond a singular nation-state narrative, emphasizing the multiplicity of languages, colonial legacies, and religious affiliations. Debates include the tension between globalization and national specificity and the risk of “de-China-lization” when adopting Western frameworks. In mainland China, cinema historically served nationalist and socialist ideological functions, with shifts post-reform toward internationally legible aesthetics (e.g., Zhang Yimou) and, more recently, technologically sophisticated, nationally confident productions (e.g., The Wandering Earth). Taiwan’s film culture reflects colonial hybridity (Bhabha), identity crises within a China–Taiwan–Japan cultural triangle, and instances of nostalgia for Japanese culture (e.g., Cape No. 7). Hong Kong’s cinema commercialized early, drawing on British live-entertainment traditions and later developing strong action, kung fu, and crime genres, while negotiating its colonial and post-handover identities. Postcolonial theory frames Taiwan and Hong Kong as hybrid cultural spaces (“third space”), and agenda-setting concepts underscore how Western editors and platforms mediate international perceptions of China.
Data were sourced from IMDb, including film plots, casts, production and review information. The corpus comprised 1,047 films produced prior to 2019 that reference China or Chinese people, spanning mainland China, Taiwan, and Hong Kong (primary analysis period 1949–2018). Computational techniques included: (1) Contextual measurement via Word2vec (Skip-gram) to learn 128-dimensional word embeddings and identify terms most similar to “China,” revealing contextual associations by region; (2) Topic modeling using Latent Dirichlet Allocation (LDA) to uncover latent thematic structures in plot synopses. Multiple topic counts (5, 8, 10, 15) were tested; five topics were selected based on interpretability, differentiation, and research aims. The LDA formalism models documents as mixtures over topics, and topics as distributions over words; (3) Sentiment analysis using Google Cloud Natural Language API to compute overall sentiment scores in the range −1.0 to +1.0 for each synopsis; (4) Temporal and statistical analysis: moving averages were applied to smooth annual time series for topic proportions (e.g., war) and sentiment. Group differences in topic proportions and sentiment across regions were tested using ANOVA with post-hoc least significant difference (LSD) comparisons. The analysis triangulates contextual semantics, thematic prevalence, and affective tone to build a spatiotemporal depiction of China’s image across the three film industries.
- Word2vec contextualization around “China” by region:
- Mainland China: closest term “single,” often in “single child/mother,” indicating family policy themes; clusters around politics (e.g., “comrade,” military/foreign policy), enemies/opponents of the nation/socialism, minorities/ethnicity; economy and livelihoods (“resident,” “province,” “poverty,” “market,” “economic”); historical memory and future development (“overcome,” “development,” “today”).
- Taiwan: everyday life orientation (“life,” “family,” “work,” “girl,” “love,” “village”); identity distinctions (“Chinese,” “Taiwanese”) and Japanese context (“Japanese,” “war,” “martial”).
- Hong Kong: metropolitan/administrative framing (“revolve,” “canton”); genre markers of kung fu and police/bandit films (“hit,” “conflict,” “hero,” “worker,” “fortune”). Editorial summaries on IMDb may reflect Western ideological lenses (e.g., liberal values, sovereignty emphasis).
- LDA topics (5): Kung Fu; Kinship & Love; Rural & Urban; War; Crime (Table 2). Overall distributions (Fig. 3):
- Kinship & Love is most prevalent (>50% of content words across regions).
- War ≈ 20%; Kung Fu and Crime ≈ 10% each; Rural & Urban < 5% in all regions.
- Regional differences in topic proportions (Table 3, ANOVA/LSD):
- Kinship & Love: Mainland higher (MC 0.547) than Hong Kong (HK 0.497) and Taiwan (TW 0.507); MC vs HK = +0.050***, MC vs TW = +0.040***.
- Kung Fu: HK higher (0.148) than MC (0.106), difference −0.042***.
- Crime: HK higher (0.119) than MC (0.087) and TW (0.090); HK vs TW = +0.029***; MC vs HK = −0.032***.
- War: TW slightly higher (0.226) than MC (0.219) and significantly higher than HK (0.200); MC vs HK = +0.018***; HK vs TW = −0.026***.
- Rural & Urban: small differences; MC slightly higher than HK (+0.006, marginal).
- Temporal dynamics of the War topic (Fig. 4):
- Mainland: peaks after victories/commemorations (e.g., 1954 ~33% post-Korean Armistice; ~1985 and ~1995 for 40th/50th AJW anniversaries; 2009 ~24% at PRC 60th anniversary). Generally below 22% in the 21st century amid focus on development.
- Taiwan: peak ~28% in 1978 amid diplomatic setbacks (loss of UN seat in 1971; Japan and others severing ties), shifting resentment toward Japan; declines in the late 1990s during pro-independence politics; higher again under KMT (2008–2016) >22%.
- Hong Kong: lower interest in war; e.g., 2011 ~13%.
- Sentiment analysis (Fig. 5, Table 4): average sentiment >0 in all regions; MC = 0.161, TW = 0.166 (no significant difference), HK = 0.030 (significantly lower). ANOVA-F = 14.474***; MC vs HK = +0.131***; HK vs TW = −0.136***.
- Sentiment over time (Fig. 6):
- Mainland: early PRC negative (−0.2 to −0.1); high during political mobilizations (1959 peak 0.32); declines in 1980s amid “Scar culture” (−0.01 in 1982; −0.1 in 1986); rises with 1997 return/2000s boom (0.24 in 2001).
- Taiwan: under martial law (1949–1987) anti-Communist but not anti-China; negative in late 1960s–early 1970s (all <0 in 1967–1974); shifts to anti-Japanese focus mid-1970s (1976 peak 0.35); low in early 1980s (−0.04 in 1983); post-1987 opening sees rise (1993 peak 0.25); dips around 1995–1996 crisis (0.02 in 1996); rebounds 1999–2001 (~0.38); declines with DPP policies (low −0.1 in 2004); rises under KMT (0.33 in 2012); declines again post-2016.
- Hong Kong: negative during 1967–1974 (<0) and pre-handover anxiety (1987–1995, low −0.11 in 1994); rises with integration/cooperation (0.25 in 2011); declines amid political protests (2018 ~0.03).
Findings show that portrayals of China across the three regions are shaped by intertwined political, economic, cultural, and ideological factors and filtered through Western platform mediation (IMDb). Mainland films present a generally positive and realistic national image tied to historical memory and development agendas, with sentiment tracking political-economic cycles. Taiwan’s portrayals are more individualized and fluid, reflecting hybrid colonial legacies (notably Japanese elements) and shifting cross-strait politics. Hong Kong’s internationally oriented, commercial cinema emphasizes kung fu and crime, blending Chinese tradition with global genre conventions, while its sentiments reflect identity anxieties before and after the handover. The results align with postcolonial theory: Taiwan and Hong Kong operate as hybrid “third spaces,” where intersubjectivities emerge from cultural convergence. The study underscores that cinema, as a cultural product, both reflects and constructs reality, and international perceptions of China are co-produced by local self-portrayals and the selection/summary practices of Western platforms.
This study offers a panoramic, data-driven account of how films from mainland China, Taiwan, and Hong Kong have portrayed China over the past century. Methodologically, it advances cultural sociology by applying word embeddings, LDA topic modeling, and sentiment analysis at scale, moving beyond single-case approaches. Substantively, it documents shared and divergent topic emphases (e.g., kinship and love dominating overall; Hong Kong’s higher kung fu and crime; Taiwan’s higher war) and region-specific contextual semantics and sentiment trajectories shaped by historical events and political cycles. The work clarifies cultural commonalities and differences among the three regions and how these contribute to the image of China presented to international audiences, highlighting the combined effects of self-representation and Western platform mediation.
The study notes several limitations: (1) Contemporary divergence in film industries across the three regions, with some mainland films performing well domestically but not in Taiwan, Hong Kong, or overseas, potentially due to political or openness factors; (2) Inability to reliably classify and analyze co-productions, which constrained comparative assessments; (3) Potential selection bias inherent in IMDb’s coverage and editorial/user tastes, as suggested by differences compared with Douban counts; (4) As a correlational, macro-level analysis, the study does not establish causal mechanisms behind topic or sentiment shifts.
Related Publications
Explore these studies to deepen your understanding of the subject.

