logo
ResearchBunny Logo
Identifying stance in legislative discourse: a corpus-driven study of data protection laws

Political Science

Identifying stance in legislative discourse: a corpus-driven study of data protection laws

L. Cheng, X. Liu, et al.

This intriguing study, conducted by Le Cheng, Xiuli Liu, and Chunlei Si, delves into stance expressions in legislative discourse surrounding data protection laws across the US, EU, and China. The analysis reveals how lawmakers appear overtly neutral while subtly shaping public ideologies through covert stance expressions.... show more
Introduction

The paper investigates how stance is expressed in data protection legislation in the United States, the European Union, and China. Motivated by the growing significance of data and privacy in the digital era and the global proliferation of data protection laws, the study seeks to understand how legislators use stance to articulate legal values, balance stakeholder interests, and signal identity and intentions within legislative texts. It addresses three research questions: (1) What are the divergences and convergences of stance expressions among the U.S., EU, and China? (2) What is the overall stance orientation of data protection laws across these jurisdictions? (3) What rationales underlie the choices and representations of stance across the three? The work is important both theoretically—filling a gap in stance research within legislative discourse and proposing a law-specific stance model—and practically—helping readers navigate legislative texts and guiding legislators to convey values and balances more precisely.

Literature Review

The study draws on multiple frameworks of stance, highlighting Biber and Finnegan’s lexical/grammatical stance (epistemic, attitudinal, style-of-speaking), Du Bois’s interactional stance and stance triangle, and Hyland’s stance and engagement framework, adopting the latter as primary. Prior stance research has concentrated on media, politics, and academic writing; legal discourse has received comparatively less attention, and within it, most focuses on judicial contexts (e.g., courtroom discourse, judgments, judicial opinions, hedging in legal genres). Legislative discourse remains underexplored despite its reflection of public ideologies and values. The review underscores the need for systematic stance analysis in legislative texts and motivates the present corpus-driven comparative approach to data protection laws.

Methodology

Design: A corpus-driven approach using Hyland’s stance framework (stance components: evidentiality, affect, presence; devices: hedges, boosters, attitude markers, self-mentions) was employed to identify and compare stance markers.

Corpora: Three comparable sub-corpora totaling 12 legislative texts (169,167 tokens) were compiled based on representativeness and authoritative sources.

  • United States Legislation Corpus (USLC): 7 texts (92,812 tokens): ADPPA (federal), and state laws CPRA, CCPA, CPA, CTDPA, VCDPA, UCPA.
  • European Union Legislation Corpus (EULC): 2 texts (58,902 tokens): GDPR and the Law Enforcement Directive.
  • Chinese Legislation Corpus (CLC): 3 texts (17,453 tokens) from the PKU-law database (English version): Cybersecurity Law (CL), Data Security Law (DSL), Personal Information Protection Law (PIPL).

Procedure: (1) Texts were converted to plain text and loaded into Lancsbox. (2) Stance features were searched using Hyland’s (2018) stance list; automated hits were then manually examined to ensure stance functionality (e.g., excluding occurrences where items were part of legal terms). Counts were double-checked by a co-author, with peer consultation as needed. (3) Using Hyland’s stance-and-engagement framework, only stance (writer-oriented) was analysed; engagement features were noted as rare in legislative texts and thus not examined. (4) Concordance analyses of salient items were conducted to interpret contextual usage and functions, combining quantitative frequency analysis with qualitative examination.

Analytic focus: Evidentiality via hedges and boosters (commitment and certainty), affect via attitude markers (positive/negative orientations), and presence via self-mentions. Normalised frequencies were calculated (per 10,000 tokens) for cross-corpus comparison.

Key Findings
  • Overall stance frequency: EULC shows the highest proportion of stance items (~28%) and highest normalised frequency (153.67 per 10,000), followed by USLC (93.05) and CLC (86.50). Total stance items: USLC 863; EULC 905; CLC 151.
  • Device distribution: Hedges are the most prevalent stance device across all corpora; self-mentions are rare, absent in EULC and CLC, and minimal in USLC.

Evidentiality – Hedges:

  • EULC exhibits the highest hedge frequency (105.77 per 10,000), indicating strong emphasis on legal reasoning and prudence.
  • Top hedges: USLC (may 30.92; about 7.54; would 5.71; should 5.28; possible 1.94), EULC (should 41.93; may 35.82; possible 6.69; could 4.41; likely 4.24), CLC (may 30.94; relatively 2.86; assume 1.72; possible 1.15).
  • Deontic modality differentiates jurisdictions: should dominates in EULC (obligational emphasis); may is most frequent in USLC and CLC; constructions like may not in USLC express obligation/prohibition (e.g., ADPPA concordances).
  • Epistemic modality: possible appears in all three; it is notably higher in EULC (6.69) than USLC (1.94) and CLC (1.15), aligning with EU prudence in signalling legal possibilities.

Evidentiality – Boosters:

  • Booster frequencies are relatively similar across corpora, slightly higher in USLC and CLC than EULC.
  • Top boosters include establish/established across all: USLC establish 3.77, established 3.77; EULC established 9.00, establish 1.70; CLC establish 10.89, finds 4.58. Other recurrent boosters: clear/clearly, known, certain, truly.
  • The prominence of establish(ed) reflects strong legal constructiveness in an emerging domain: rules and principles are emphasised as being established, adapted, and enforced.

Affect – Attitude markers:

  • EULC shows the highest frequency of attitude markers, USLC the lowest (6.59 per 10,000), indicating stronger value signalling in EU texts.
  • Top markers: appropriate is frequent in all (USLC 4.53; EULC 21.73; CLC 7.45); important ranks high in EULC (1.02) and CLC (9.17); essential ranks second in USLC (0.65). Negative-oriented markers like appropriate help project prudence and attenuate overt compulsion in obligations (e.g., “appropriate technical and organisational measures” in GDPR), functioning as stance-softening strategies.
  • Collocational patterns: CLC associates important with data (e.g., important data), aligning with graded data security and national/public interest priorities; EULC links important to public interest; USLC associates essential with goods and services, reflecting consumer-market orientation.

Presence – Self-mentions:

  • Self-mentions are virtually absent in EULC and CLC; USLC shows very low frequency (0.97 per 10,000). In USLC, my appears in consumer-facing labels (“Do Not Sell My Personal Information”), and our appears in prefatory/declaration sections to build identification (“our society”), not within statutory provisions, preserving objectivity.

Cross-cutting orientations:

  • Legislative modesty: pervasive hedging and minimal self-mention suggest a preference for prudence and objectivity.
  • Discursive space: hedging and cautious attitude markers leave interpretive room for judicial application in a fast-evolving, technology-driven domain.
Discussion

The findings show convergences and divergences in legislative stance across jurisdictions. Convergences include a dominant use of hedges and minimal self-mention, indicating a shared orientation toward prudence, neutrality in tone, and maintenance of legislative authority. Divergences align with socio-legal values: the EU’s high hedge and affect marker usage suggests strong commitments to precision, reasoning, and value signalling rooted in fundamental rights protection. China’s relatively lower stance-item proportion is consistent with a strategic, comparatively neutral presentation balancing personal data rights with data flow and national/public interests; stance is often expressed covertly (e.g., via appropriate). The U.S. exhibits a market-and-consumer orientation, with obligations framed through constructions like may not and emphasis on enforcement and access to essential goods/services. Together, these patterns answer RQ2 and RQ3: overall orientations are legislative modesty and provision of discursive space for courts, with jurisdictional differences explained by distinct legal cultures, priorities, and historical trajectories in data protection. The paper also interprets stance choices as reflecting legislative constructiveness in an emerging field, where principles must be explicitly established and adapted.

Conclusion

The study provides a corpus-driven, comparative account of stance in data protection legislation across the U.S., EU, and China using Hyland’s framework. It identifies common reliance on hedging and low self-mention to project prudence and objectivity, while highlighting jurisdictional differences in modality, attitude markers, and boosters that reflect underlying legal values and ideologies. Contributions include: (1) demonstrating the unique legal constructiveness of stance in emerging domains like data protection; (2) showing that legislative discourse prioritises covert stance (e.g., hedging, negative-oriented attitude markers) over overt boosting; and (3) proposing a specialised, law-focused research model of stance (sub-categorising evidentiality, affect, and presence) to aid future legal text analysis. Future research is encouraged to extend stance analysis into judicial domains (court judgments, courtroom discourse) and to explore additional languages and legal systems.

Limitations
  • Scope limited to legislative texts in three jurisdictions (U.S., EU, China) and to 12 documents (169,167 tokens), which may affect generalisability.
  • Analysis focuses on writer-oriented stance; engagement (reader-oriented) features were not examined, as they are rare in legislative texts.
  • Chinese legislative texts were sourced from an English-version database, potentially introducing translation-mediated effects.
  • The study is corpus-driven with manual validation of stance items; despite double-checking, automated search plus manual filtering may miss or overinclude borderline cases.
  • The findings are situated in data protection, an evolving domain; patterns may shift as laws and interpretations develop.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny