logo
ResearchBunny Logo
Meaning patterns of the NP de VP construction in modern Chinese: approaches of covarying collexeme analysis and hierarchical cluster analysis

Linguistics and Languages

Meaning patterns of the NP de VP construction in modern Chinese: approaches of covarying collexeme analysis and hierarchical cluster analysis

J. Zhou

This groundbreaking research conducted by Jiangping Zhou delves into the meanings of the NP de VP construction in Modern Chinese, revealing fascinating patterns such as the synergy between 'regulations' and 'implementation'. Discover the intriguing relationships among lexical items in both NP and VP slots, all backed by robust analytical methods.

00:00
00:00
~3 min • Beginner • English
Introduction
The NP de VP construction in modern Chinese is a nominal construction consisting of an NP followed by the particle de and a VP (e.g., mubiao de shixian ‘realization of target’). Prior work debated the word class of the VP (nominalized vs. verb), the head of the construction, and the semantic relationship between NP and VP (agent vs. patient), and proposed informal meaning tendencies. Yet, there has been no large-corpus, statistics-based study identifying typical meanings and meaning patterns of the NP and VP slots based on significantly attracted instances. This study adopts Goldberg’s constructional view that constructions are form-meaning pairings and asks: (1) What are the typical meanings of the NP de VP construction? (2) What are the meaning patterns of the VP slot? (3) What are the meaning patterns of the NP slot? By employing covarying collexeme analysis on a large POS-tagged corpus and clustering, the study aims to reveal statistically supported typical meanings and semantic groupings.
Literature Review
Two main hypotheses address the VP slot: (i) the VP is nominalized, aligning with the overall nominal nature of the construction, and (ii) the VP remains verbal, supported by compatibility with adverbials and negation (e.g., chichi bu shixian ‘delayed not realize’). This paper treats VPs as verbs per BCC POS tagging but notes that either stance does not affect the semantic aims. Studies on NP–VP semantic relations (e.g., Chen 1987; Zhang 1993; Wang 2002, 2010; Wu & Guo 2018) show NPs can be agents or patients, with patient-like NPs more common. Prior accounts of meaning tendencies suggested that only verbs with weak action/strong event enter the VP slot (Zhan 1998; Wang 2002) and that NPs exhibit prominence combining informativity and accessibility (Shen & Wang 2000). However, these claims were not grounded in large-corpus statistical evidence. The present work addresses this gap by identifying significantly attracted NP–VP pairings and clustering their meanings.
Methodology
Corpus: Data come from the Beijing Language and Culture University Corpus Center (BCC), a ~9.5 billion-token, POS-tagged corpus spanning newspapers (~2B), literature (~3B), micro-blogs/film (~0.6B), classical Chinese (~2B), and miscellaneous (~1.9B). Data collection: The query "../n的../v" (two-character noun + de + two-character verb) was used to retrieve relevant instances, focusing on 2-character items to reduce noise from 1-character forms and to support association-strength computations. Approaches: (1) Covarying collexeme analysis (Stefanowitsch & Gries) assesses association strength between an NP-slot item L and a VP-slot item M in the NP de VP construction using a 2×2 contingency table of co-occurrence and non-co-occurrence counts. Fisher–Yates exact test p-values are transformed by negative base-10 logarithm; thresholds: >1.301 (p<0.05), 2 (p<0.01), 3 (p<0.001), Inf for effectively infinite strength. The coll.analysis script (Gries 2014) in R was used. (2) Hierarchical cluster analysis (hclust in R; Ward’s method) was applied to contingency tables built from significantly attracted items to group lexical items into semantic clusters. Clustering of VP items relied on their covarying NP collexemes; clustering of NP items relied on their covarying VP collexemes, following Gries & Stefanowitsch (2010).
Key Findings
Covarying collexeme analysis identified 515 instances significantly attracted to the NP de VP construction (the 515th: shuiping de zengjia, Coll.s=1.71, p<0.05); 71 instances reached infinite association strength. Top attracted pairs include: mubiao de shixian ‘realization of targets’ (obs. 2344), shijian de yanchang ‘prolonging of time’ (990), zeren de chengdan ‘undertaking of responsibility’ (142), tizhi de jianli ‘establishment of regulation’ (938), tixi de jianli ‘establishment of systems’ (876), kecheng de kaishe ‘setting up of subjects’ (118), wenti de jiejue ‘resolution of issues’ (842), bufa de jiakuai ‘speeding up of steps’ (781), zuoyong de fahui ‘performance of functions’ (751), huiyi de zhaokai ‘convening of meeting’ (750), xingwei de fasheng ‘taking place of behaviors’ (733), guimo de kuoda ‘expanding of scales’ (714), chengji de qude ‘achievement of results’ (666), shigu de fasheng ‘occurrence of accidents’ (1893), yishi de zengqiang ‘increase of awareness’ (652), zhishi de zhangwo ‘command of knowledge’ (619), shuiping de tigao ‘improvement of levels’ (536). Typical constructional meaning pairings: (a) NP denoting regulations with VP denoting implementation; (b) NP denoting systems with VP denoting establishment; (c) NP denoting results with VP denoting achievement. VP slot clusters (from hierarchical clustering) predominantly exhibit six semantic patterns: cognition (e.g., juede ‘think’, xihuan ‘like’), augmentation (e.g., kuoda ‘expand’, zengjia ‘increase’, zengqiang ‘enhance’, tigao ‘improve’, jiakuai ‘speed up’, jiada ‘increase’), implementation (e.g., zhiding ‘enact’, guanche ‘implement’, kaizhan ‘carry on’, qianshu ‘sign’, lvxing ‘perform’, xingshi ‘perform’), achievement (e.g., qude ‘achieve’, jieshu ‘finish’, jiejue ‘resolve’, shixian ‘realize’, zhangwo ‘command’, wancheng ‘accomplish’), establishment (e.g., jianli ‘establish’, sheli ‘set up’, kaishe ‘set up’), and report (e.g., baodao ‘report’, tichu ‘propose’, biaoda ‘express’). Some additional verbs (e.g., zhaokai ‘convene’, chengdan ‘undertake’, yingxiang ‘influence’) are important but less easily grouped. NP slot clusters predominantly exhibit six semantic patterns: internal traits (e.g., nengli ‘ability’, rencai ‘talent’, suzhi ‘quality’, yishi ‘awareness’, zeren ‘responsibility’, zhenxin ‘sincerity’, jineng ‘skill’), medical names (e.g., jibing ‘disease’, xibao ‘cell’, jiyin ‘gene’, bingfazheng ‘syndrome’; also some negative-event nouns like shigu ‘accident’), regulations (e.g., biaozhun ‘standard’, zhengce ‘policy’, zhidu ‘regulation’, quanli ‘rights’, gongneng/zuoyong ‘function’), results (e.g., bufa ‘step’, jiezou ‘rhythm’, chengji ‘result’, chengjiu ‘achievement’, jiazhi ‘value’, jincheng ‘progress’), systems (e.g., jigou ‘organization’, tixi ‘system’, tizhi ‘regulation’, jizhi ‘mechanism’), and business (e.g., dahui/huiyi ‘meeting’, hetong ‘contract’, xieyi ‘agreement’). Representative co-occurrence tendencies include regulations with implementation verbs (e.g., zhidu with guanche/zhiding) and systems with establishment verbs (e.g., tizhi with jianli), and results with achievement verbs (e.g., chengji or jiazhi with qude/shixian).
Discussion
The VP meaning patterns largely align with, but also refine, Zhan’s (1998) claim that VP items entering NP de VP tend to denote weak action/strong event. Among the most strongly attracted verbs (infinite-level significance), approximately 82.5% fit the weak-action/strong-event profile, while about 17.5% display strong action/weak event (e.g., tunshi ‘swallow’, ruqin ‘invade’, qianshu ‘sign’, jianli ‘establish’). Differences from earlier studies stem from: (i) data source (large, real corpus vs. dictionary-based or invented examples), (ii) focus on statistically typical instances via covarying collexeme analysis rather than all possible instances, and (iii) differing goals (this study targets typical meaning patterns and clustering). For NPs, findings support Shen and Wang’s prominence account (informativity and accessibility): many NP clusters (regulations, systems, business, medical names, results) are relatively accessible/specific. Internal-traits nouns, though more abstract (lower accessibility), are contextually informative and thus still prominent. The results exemplify the Principle of Linguistic Meaning Conservation (Mondal 2019): diversity in NP and VP lexical types allows conservation of information through collocational variation, yielding varied but systematic meaning patterns. From a Construction Grammar perspective, the construction shows: compositionality (NP denoting regulations predicts implementation-type VPs; systems predict establishment; results predict achievement), prototypicality (significantly attracted items form core members defining typical meanings), and conventionality (high-frequency, strongly associated NP/VP items reflect entrenched conventional pairings). Pedagogically, core collexemes and their clusters can inform instruction for L1 and L2 Chinese learners, aligning with findings that learners first acquire distinctively associated collexemes.
Conclusion
Using BCC corpus data and covarying collexeme analysis, the study identified 515 significantly attracted NP–VP instances, with 71 at effectively infinite association strength. Hierarchical clustering revealed dominant VP meaning patterns: cognition, augmentation, implementation, achievement, establishment, and report; and dominant NP meaning patterns: internal traits, medical names, regulations, results, systems, and business. These statistically grounded patterns specify the typical meanings of the NP de VP construction and clarify how NP semantics pair with VP semantics. Future research should stratify by genre or text type (e.g., fiction, newspapers, academic prose) to examine whether significantly attracted instances and meaning patterns vary across registers.
Limitations
The dataset aggregates across genres and text types; genre-specific distributions and patterns were not analyzed and may differ. The 2-character constraint on NP/VP items excludes some legitimate 1-character or longer forms. Hierarchical clustering based on covarying collexemes can sometimes group semantically less-related items (e.g., certain outliers within clusters), indicating room for methodological refinement. Reliance on POS tagging and corpus retrieval patterns may introduce tagging or query biases.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny