Linguistics and Languages
Entity, event, and sensory modalities: An onto-cognitive account of sensory nouns
Y. Zhong, K. Ahrens, et al.
The paper addresses foundational questions about the nature and cognitive motivation of grammatical categories, especially the noun–verb bifurcation, without presupposing prior knowledge of parts of speech. It asks: (1) What basic ontological concept, with minimal prior knowledge, supports the conceptualisation of shared human experience and the emergence of grammatical categories? (2) Does this shared foundation produce a noun–verb bifurcation across languages? (3) What feature of the physical world allows humans, via embodied cognition and without a priori concepts, to form a shared principle that underlies the noun vs. verb contrast? The authors propose that reference to time—formalised as an endurant (time-independent) versus perdurant (time-dependent) ontological bifurcation—provides the minimal conceptual premise for categorisation, offering a non-tautological basis for noun–verb distinctions. Sensory language is chosen as the empirical domain because perception mediates body–world interactions and sensory lexicons encode rich experiential information. Building on Generative Lexicon (GL) qualia structures and formal ontologies (e.g., DOLCE, BFO), the study examines how sensory nouns in Mandarin encode endurant/perdurant properties and whether these ontological properties align with cognitive and linguistic categorial patterns.
The study situates itself within debates on grammatical categories and cognitive grounding of parts of speech. Traditional feature-based accounts ([±N], [±V]; Chomsky 1970; Baker 2003; Haegeman 1994) are argued to be circular when defined via intuitive notions of “nouny” and “verby.” Distributional learning work shows categories can be learned without a priori PoS knowledge (Redington et al., 1995, 1998) and that learned representations can be semantically interpretable (Chersoni et al., 2021). Cognitive proposals link verbs to mutability and events (Gentner, 1978, 1982; Ahrens, 1999; Givón, 2001) and nouns to temporal stability. Strik Lievers and Winter (2018) found lexical-category tendencies across sensory modalities, suggesting a cognitive basis for “eventivity.” The authors adopt Aristotle’s insight that nouns lack reference to time while verbs carry it, connecting this to modern formal ontologies (DOLCE; BFO) that distinguish endurants/continuants vs. perdurants/occurrents. In Chinese, extensive nominal–verbal fluidity and deverbal nominals offer a testing ground without overt derivational morphology. The Generative Lexicon (GL) theory (Pustejovsky 1991, 1995, 2013) provides a formal qualia-based representation linking to ontology and to argument selection via semantic typing rather than grammatical categories. GL’s lexical typing (natural, artefactual, complex/dot types) maps to qualia roles (formal/constitutive vs. telic/agentive), enabling an ontological lens on endurant/perdurant properties. Prior Chinese GL research (e.g., Huang & Ahrens 2003; Song & Huang 2018) supports the approach. The paper also engages with debates on event nouns and deverbal nominals in Chinese (Grimshaw, 1990; Fu, 1994; Wang, 2013; Han, 2010; Deng, 2021), and later discusses nominal tense literature (Nordlinger & Sadler, 2004, 2008; Tonhauser, 2007, 2008; Bertinetto, 2020).
Design: Corpus-based analysis of Mandarin sensory nouns using Generative Lexicon (GL) theory to annotate qualia structures and classify lexical typing structures (natural, artefactual, complex) and their mapping to endurant (time-independent) or perdurant (time-dependent) properties. Data source: Chinese Web 2011 (zhTenTen11) corpus accessed via Sketch Engine. Target constructions: Objects of basic sensory verbs (object-of relation) across five modalities:
- Vision: kàn ‘look/see’, jiàn ‘see’, kàn/jiàn-dào ‘saw’
- Hearing: tīng ‘listen/hear’, tīng-dào ‘heard’
- Taste: cháng ‘taste’, cháng-dào ‘tasted’
- Smell: wén/xiù ‘smell/sniff’, wén/xiù-dào ‘smelt’
- Touch: mō ‘touch’, chū ‘touch’ (as used), gǎnjué ‘feel’, mō/chū-dào, gǎnjué-dào Procedure:
- Extraction: Use Sketch Engine’s Word Sketch to list nouns in object position of each sensory verb.
- Sense filtering (data cleaning): For each verb, retain only senses tied to perception or integration via that modality (per Chinese WordNet 2.0 definitions). For example, for kàn, keep ‘perceive through sight’ and ‘understand/appreciate through sight’, excluding extended/metaphorical uses.
- Annotation: For each extracted noun (per modality and verb sense), annotate GL qualia structure where relevant and classify lexical typing structure:
- Natural types (formal/constitutive roles): assumed endurant.
- Artefactual types (telic/agentive roles): endurant under pure perception uses; perdurant when integration/appreciation/understanding functions are selected via coercion.
- Complex (dot) types (e.g., [sound-info], [object-sound], [event-sound]): treated as perdurant when time is involved; co-predication allowed.
- Mapping to endurant/perdurant: Based on verb sense and lexical typing, mark nouns as endurant (natural types; artefacts under pure perception) or perdurant (artefacts under information-integration uses; all complex types).
- Analysis: Summarise distributions by modality and sense; compare across senses; relate patterns to hypothesised ontological motivation for noun–verb bifurcation. Tools/Resources: Sketch Engine Word Sketch; Chinese WordNet 2.0 for sense guidance. Units of analysis: Nouns occurring as objects of sensory verbs in specific senses. Examples and classifier diagnostics used where relevant (e.g., Chinese event vs. entity classifiers).
- Vision (kàn ‘to look/see’): 310 nouns total. Two senses analyzed. • To perceive through sight (n=160): Natural 124 (77.5%); Artefactual 27 (16.9%); Complex 9 (5.6%). Natural types include appearance, color, light, location, natural scenes (e.g., tiānkōng ‘sky’, fēngjǐng ‘scenery’). Artefacts include images and artefacts (e.g., túpiàn ‘picture’, yānhuā ‘fireworks’). Complex types rare. • To understand/appreciate through sight (n=150): Artefactual 86 (57.3%); Complex 64 (42.7%); Natural 0. Artefacts include texts and entertainment items (wénzhāng ‘article’, xīnwén ‘news’, diànyǐng ‘movie’). Complex types include [object/info + function] items (shū ‘book’, bàozhǐ ‘newspaper’, diànshì ‘television’, zhǎnlǎn ‘exhibition’), and some event nouns (e.g., yǎnchànghuì ‘concert’). • Overall visual lexical types: Natural 124 (40.0%); Artefactual 113 (36.5%); Complex 73 (23.5%). Vision shows balanced endurant vs. perdurant properties.
- Hearing (tīng ‘to listen/hear’): 385 nouns across three senses (each proportion: perceive sound 26%, appreciate sound 26%, receive information 48.1%). Overall lexical types: Natural 21 (5.5%); Artefactual 105 (27.3%); Complex 259 (67.3%). • Perceive sound (n=100): Natural 21 (21%); Complex 79 (79%). Complex mainly [sound-event] across three subtypes: event-induced sounds (compound nouns like gēshēng ‘sound of singing’), simple event nouns (fēng ‘wind’, yǔ ‘rain’), and deverbal nouns (xīntiào ‘heartbeat’). • Appreciate sound (n=100): Artefactual 17 (17%); Complex 83 (83%). Complex facets include [sound-info] (gēqǔ ‘song’), [object-sound] (yuèqì ‘instrument’), [human-sound] (e.g., ‘Beethoven’ via telic/agentive roles), [event-sound] (yǎnchàng ‘sing’). • Receive information through hearing (n=185): Artefactual 88 (47.6%); Complex 97 (52.4%). Artefacts exploit telic roles (speaking, listening, communication). Complex dominated by [event-info] including event nouns (kè ‘class’, jiǎngzuò ‘seminar’) and deverbal nouns (liáotiān ‘chatting’, tánhuà ‘talking’, huìbào ‘reporting’). • Hearing strongly favors complex (perdurant) targets indicative of eventive/information integration.
- Taste (cháng ‘to taste’): 42 nouns; single sense ‘distinguish/taste flavor’. Natural 8 (19%); Artefactual 34 (81%); Complex 0. Natural types are flavor attributes (zīwèi ‘flavor’, xiāngwèi ‘fragrance’); artefacts are foods/drinks (càiyáo ‘dishes’, měijiǔ ‘fine wine’, xiǎochī ‘snacks’).
- Smell (wén/xiù ‘to smell/sniff’): 52 nouns; all Natural 52 (100%); no artefactual or complex types attested. Odor/odor value lexemes dominate (qìwèi ‘odor’, chòuwèi ‘bad smell’, fāngxiāng ‘fragrance’, qīngxiāng ‘faint scent’).
- Touch (mō ‘to touch’; gǎnjué ‘to feel’): 58 nouns total. By sense: mō 46 (79.3%); gǎnjué 12 (20.7%). Overall lexical types: Natural 47 (81%); Artefactual 10 (17.2%); Complex 1 (1.7%). For mō (touch), natural types include body/body-substance items; artefacts include technological objects (píngmù ‘screen’, jiànpán ‘keyboard’). For gǎnjué (feel), natural types include temperature/bodily feelings (nuǎnyì ‘warmth’, liángyì ‘coolness’).
- Endurant vs. Perdurant summary (selected modalities): • Vision: Endurant 151 (48.7%); Perdurant 159 (51.3%); Total 310. • Hearing: Endurant 21 (5.5%); Perdurant 364 (94.5%); Total 385. • Touch: Endurant 57 (98.3%); Perdurant 1 (1.7%); Total 58. Interpretation: Tactile targets are overwhelmingly endurant (time-independent), auditory targets overwhelmingly perdurant (time-dependent), and visual targets balanced. Results align with the hypothesis that time dependency underlies categorial bifurcation and with prior English findings (Strik Lievers & Winter, 2018) showing touch over-represented by nouns and hearing by verbs.
Findings address the core question of whether minimal ontological concepts, specifically time (in)dependence, motivate grammatical categorisation. Using GL’s qualia and lexical typing, the study shows sensory nouns encode heterogeneous endurant/perdurant properties that vary by modality: touch aligns with endurant properties, hearing with perdurant properties, and vision is versatile. This supports the proposal that a fundamental ontological bifurcation—endurant vs. perdurant—maps onto linguistic encoding strategies associated with noun–verb distinctions, offering a non-circular cognitive basis for categorial systems without presupposing parts-of-speech. The artifact/complex types’ behavior under perception vs. information-integration senses demonstrates how verb selection coerces time-dependent interpretations, revealing modality-specific cognitive processing (e.g., music and speech require event/information integration). Cross-methodological triangulation is suggested by parallels with neuro-cognitive evidence (Sanchez et al., 2020): somatosensory (touch) shows earlier latencies consistent with instantaneous perception of endurants, whereas hearing/vision show later latencies aligned with integration over time (perdurants). The study further argues that tense/aspect marking on nouns in some languages does not undermine the endurant status of nouns; rather, such markers highlight variability of the same enduring referent (cf. Bertinetto, 2020; Huang, 2015). Overall, the results substantiate a cognitively and ontologically grounded account of the noun–verb bifurcation via time dependency and demonstrate how sensory lexicon distributions reflect embodied interactions with the world.
The paper establishes, for the first time, a cognitive/ontological motivation for the noun–verb bifurcation without relying on prior PoS knowledge: conceptual reference to time, formalised as an endurant–perdurant distinction, underpins grammatical categories. Through a corpus-based analysis of Mandarin sensory nouns within the Generative Lexicon framework, the study shows modality-specific distributions: tactile nouns are almost entirely endurant, auditory nouns strongly perdurant, and visual nouns balanced. These patterns illuminate how perception, cognition, and language interact, positioning nouns centrally within sensory experience encoding and corroborating cross-linguistic tendencies reported in English. Implications extend to studies of linguistic synaesthesia and modality exclusivity norms, suggesting the time-dependent nature of auditory perception contributes to its abstractness and mapping patterns. Future work should: (1) expand datasets for gustatory and olfactory domains and include verbs encoding information integration in these modalities; (2) test cross-linguistic generality, especially in languages with rich olfactory lexicons; (3) integrate psycholinguistic and neuro-cognitive experiments to directly test processing predictions from endurant/perdurant mappings; and (4) refine computational models linking distributional learning to ontologically interpretable features.
- Data sparsity for gustatory and especially olfactory nouns, and lack of clear information-integration verb senses for these modalities, limit cross-modality comparison.
- Reliance on corpus co-occurrence (object-of relation) and sense filtering may miss less frequent or context-dependent sensory uses; metaphorical extensions were excluded.
- Mapping artefactual and complex types to endurant/perdurant depends on verb-driven coercion assumptions; borderline cases may be sensitive to annotation decisions.
- Findings are based on Mandarin Chinese and may be language-specific; generalisability requires cross-linguistic validation, particularly for modalities with differing cultural codability.
Related Publications
Explore these studies to deepen your understanding of the subject.

