logo
ResearchBunny Logo
Register-based distribution of expressions of modality in COCA

Linguistics and Languages

Register-based distribution of expressions of modality in COCA

J. Zhou and Y. Xia

Discover how modality shapes communication across various registers in contemporary American English. This intriguing research by Jiangping Zhou and Yanhua Xia reveals the nuanced preferences for modal expressions, highlighting the tension between objectivity and subjectivity in language use.... show more
Introduction

The study is framed within Systemic Functional Linguistics (SFL), where semantic modal meanings are realized in lexico-grammar either within the clause (e.g., modal operators, adjuncts, nominalizations) or outside the clause via interpersonal metaphor (explicit subjective and explicit objective orientations). Prior research has either focused narrowly on specific modal types or not examined how register formality shapes their distribution. This paper investigates, synchronically, how a fuller set of modality expressions distribute across registers in COCA and how this distribution relates to register formality. Research questions: (1) How are expressions of modality synchronically distributed across registers? (2) What is the relationship between expressions of modality and register formality, and what explains it?

Literature Review

Three strands are reviewed. (1) Theoretical/descriptive studies treat subtypes separately (e.g., modal auxiliaries, adjectives, adverbs), with limited coverage and insufficient attention to variation across registers or to the dynamic nature of modality (e.g., Nuyts 1993; Halliday 1994; Palmer 2001). Studies of adverbs reveal contextual and functional differences (e.g., maybe vs perhaps; Suzuki 2018; Suzuki & Fujiwara 2017). (2) Distribution of modal verbal operators diachronically and synchronically shows shifts from core modals to semi-modals and register-based preferences (Leech et al. 2009; Collins 2009; Verhulst et al. 2013), but broader modality types and register effects remain underexplored. (3) Register-based work finds explicit objective orientation favored in formal writing and explicit subjective in informal speech/fiction (He 2020; Zhou 2023a–c), but often omits modal nominalizations and does not systematically link modality distribution to formal vs informal registers, nor include academic register in some cases. The current study fills these gaps by covering expressions at both word and clause ranks and relating them to register formality.

Methodology

Corpus: COCA (1990–2019), excluding general web, blogs, and TV/movies; five registers: spoken, fiction, magazines, newspapers, academic; total 618,200,644 words. Register formality was operationalized via Heylighen & Dewaele (1999) formality formula, validated by prior work, producing a continuum (least to most formal): spoken (NF=47.21), fiction (49.26), magazines (58.94), newspapers (59.07), academic (64.97). Data collection used COCA search queries: SQ1 [vm*] [v?i*] for modal verbal operators; SQ2 exact list for modal adjuncts (certainly, probably, possibly, surely, essentially, necessarily, obligatorily); SQ3 for modal nominalizations (certainty|probability|possibility|essentiality|necessity|obligation|likelihood). Interpersonal metaphors: SQ4 for explicit subjective (pattern with I/we + think/assume/believe/guess/suppose); SQ5 for explicit objective (it + be + modal adjective + to/that), augmented with SQ6 (it + modal auxiliary + be + adjective + to/that) and SQ7 (it + be + adverb + adjective + to/that) to capture variations. Raw counts were normalized per million words (PMW) and adjusted by equal totality where needed. Statistical analysis in R: normality (Shapiro-Wilk), group differences (Student’s t-test or Mann-Whitney as appropriate), and correlations (Pearson or Spearman). Visualizations via Excel.

Key Findings
  • Across subtypes, low/median-value items are preferentially used to evaluate propositions and entertain alternative voices. Examples: can/could, will/would, may/might; probably and possibly; possibility; I/we think; it is possible.
  • Modal verbal operators: High-value must and shall/should are comparatively not preferred and plateau across registers. Low-value can/could is higher in informal registers (spoken 1982 PMW; academic 1502). Will/would declines with formality (spoken 2896; academic 1586). May/might increases in more formal registers. Variation tests show most operators differ significantly except can/could vs will/would (t=-2.0465, p=0.0901). May/might differs (t=3.8977, p=0.0055). Will/would vs must negatively correlated (r=-0.9211, p=0.0262).
  • Modal adjuncts: Certainly and probably are frequent in informal registers (spoken: certainly 308; probably 388) and much less in academic (certainly 93; probably 123). Essentially and necessarily are disfavored in fiction but preferred in academic. Variation shows certainly differs from others (e.g., W=25, p=0.0079 vs possibly/essentially/necessarily); probably differs from others (t≈4.17–4.36, p<0.02). Correlations: probably with possibly (r=0.9065, p=0.0338); essentially with necessarily (r=0.9268, p=0.0235).
  • Modal nominalizations: Possibility and obligation are low in fiction (33 and 7 PMW) and peak in academic (95 and 25). Certainty/probability/necessity are lowest in spoken (7/4/5) and much higher in academic (14/46/34). Variation: possibility significantly differs from certainty, probability, necessity, obligation (e.g., t=-4.2787, p=0.0119; W=2.5–24, p≤0.0459). Correlation: possibility and obligation strongly positive (r=0.9937, p=0.0006).
  • Interpersonal metaphors: Explicit subjective is most frequent in informal registers; I/we think peaks in spoken (410.34) and drops in academic (13.37). Instances (think, believe, guess, suppose, assume) vary significantly (e.g., think vs assume/believe/guess/suppose: W=23–25, p≤0.0318). Items within this subtype are positively correlated (e.g., assume–guess r≈; p=0.0021).
  • Explicit objective: Less frequent in spoken/fiction (fiction PMW: probable 0.19, possible 9.78, essential 0.43, necessary 2.62, likely 1.9; spoken: probable 0.19, possible 13.34, essential 1.24, necessary 3.16, likely 6) and preferred in academic (e.g., it is/was possible 48.83; overall explicit objective 98). Significant pairwise differences include probable vs possible, possible vs essential, probable vs likely, probable vs necessary. Within-subtype correlations indicate similar increasing trends with formality (e.g., essential–likely p=0.0020; possible–necessary p=0.0167).
  • Overall distributions: Modal verbal operators decline from spoken (11359 PMW) to academic (9706). Adjuncts are high in spoken (917) and similar in newspapers (411) and academic (417). Nominalizations are high in academic (1370) vs spoken (585) and newspapers (490). Explicit objective is flat from spoken (25) to newspapers (23) but surges in academic (98). Variation shows verbal operators differ from all other subtypes at p≤0.01, adjuncts differ from explicit subjective and explicit objective, and nominalizations differ from explicit subjective/objective.
  • Cross-subtype correlations: Verbal operators with adjuncts (r=0.8761, p=0.05) and adjuncts with explicit subjective (r=0.9677, p=0.0069) are positively correlated, indicating co-preference in informal registers. Nominalizations with explicit objective correlate strongly (r=0.9816, p=0.003), indicating co-preference in formal registers. Negative, non-significant correlations suggest compensatory patterns: nominalizations rise as operators/adjuncts fall; explicit objective rises as explicit subjective falls.
Discussion

Findings address RQ1 by showing a clear register-sensitive distribution: low/median-valued modal expressions dominate, with congruent forms (modal operators/adjuncts) prevalent in informal registers and grammatical metaphor (nominalizations and explicit objective orientations) prevalent in formal registers, especially academic. For RQ2, formal registers favor nominalizations and explicit objective orientations to conceal the commentator and present propositions more objectively, enhancing credibility, persuasiveness, and authority expected in academic discourse. Conversely, informal registers favor operators, adjuncts, and explicit subjective patterns that directly encode interpersonal stance and allow negotiability and engagement with multiple voices. The correlation analyses reinforce these functional alignments: co-occurrence of operators/adjuncts/explicit subjective in informal contexts and of nominalizations/explicit objective in formal contexts. The compensatory trends suggest a systemic balancing: as texts become more formal, subjective and implicit clause-internal modal markings recede while objectified and clause-external realizations increase to foreground objectivity.

Conclusion

The study provides a comprehensive, register-based account of modality in COCA, covering both word-rank and clause-rank realizations. It shows that low/median-value expressions (e.g., can/could, will/would, may/might; probably/possibly; possibility; I/we think; it is possible) are widely used to keep propositions negotiable. In formal registers, increases in modal nominalizations and explicit objective orientations compensate for decreases in verbal operators/adjuncts and explicit subjective orientations, thereby concealing subjectivity and highlighting propositional objectivity. Contributions include (1) a fuller inventory of modality expressions across ranks analyzed synchronically across registers, and (2) theoretical linkage of distributional preferences to interpersonal metaphor of modality. Future work should extend to diachronic trends, include negative formulations of modality, and examine other English varieties.

Limitations
  • Scope is synchronic; diachronic distributions are not analyzed.
  • COCA reflects American English; generalization to other varieties should be cautious.
  • Only positive forms of modality were analyzed; negative formulations (e.g., may not, it is not necessary) were excluded, potentially biasing findings.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny