logo
ResearchBunny Logo
Writing impact case studies: a comparative study of high-scoring and low-scoring case studies from REF2014

Education

Writing impact case studies: a comparative study of high-scoring and low-scoring case studies from REF2014

B. Reichard, M. S. Reed, et al.

This paper unveils fascinating insights into the linguistic nuances of research impact case studies from the 2014 UK Research Excellence Framework. Conducted by a team of experts including Bella Reichard and Mark S Reed, the research reveals how writing style impacts scoring, showcasing the effectiveness of coherent and clear communication for high-scoring case studies.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses pressures on academics to demonstrate societal impacts of research and the rise of impact evaluation frameworks such as the UK's Research Excellence Framework (REF). It situates the work within debates over the benefits and critiques of impact evaluation, including concerns about market logics, performativity, distortions of research priorities, and subjectivity in assessment. Within REF2014, impact case studies were scored on significance, reach, and attribution. Prior analyses and panellist accounts suggested that implicit criteria linked to narrative, style, and structure may influence judgments. The study aims to empirically assess content and linguistic differences between high- and low-scoring REF2014 case studies across Main Panels. Research questions: (1) How do high-scoring versus low-scoring case studies articulate and evidence impacts linked to underpinning research? (2) Do high-scoring and low-scoring case studies have differences in their linguistic features or styles? (3) Do they show lexical or text-level differences (reading ease, narrative clarity, cohesive devices)?
Literature Review
The introduction synthesizes prior critiques of the UK impact agenda, highlighting concerns about neoliberal market logic, potential dehumanization and unintended negative outcomes (e.g., grimpact), and worries about distorting research agendas. Countervailing views point to benefits for stakeholder engagement, public legitimacy, and applied research. Specific to REF2014 impact assessment, prior work by Pidd and Broadbent, and Watermeyer and colleagues, suggested that persuasive narrative, style, and structure influenced scoring, while Derrick emphasized subjectivity and group dynamics in panel deliberations, use of proxies for excellence, and potential biases. These studies jointly motivate the need for empirical analysis of textual and structural features that distinguish high- and low-scoring case studies.
Methodology
Design: Two complementary studies were conducted. (1) Quantitative linguistic analysis of lexical patterns and readability. (2) Qualitative thematic analysis of content, structure, and evidence use. Sampling and score inference: Although individual REF2014 case study scores were not public, the authors inferred high-scoring (4*) cases in Units of Assessment (UoAs) where institutional submissions received uniform scores; low-scoring (1*/2*) cases were identified where available in the same UoAs. Quantitative sample: 124 identifiable high-scoring and 93 identifiable low-scoring case studies across 20 UoAs. Only text-heavy sections were analysed: Section 1 (Summary of the impact), Section 2 (Underpinning research), and Section 4 (Details of the impact). Qualitative sample: 85 high-scoring and 90 low-scoring case studies selected to balance Main Panels and include UoAs with both high and low-scoring cases; 75% overlap with quantitative sample. Quantitative linguistic analysis: The corpus was split into high- vs low-scoring subcorpora. Lexical bundles (2–4 word sequences) were extracted using AntConc and compared between subcorpora overall and by Main Panel (A, C, D). Statistical tests: Log Likelihood (>3.84 approximating p<0.05; higher thresholds for low expected values) assessed significance; Log Ratio (>|0.5|) quantified effect size. Readability was measured with Coh-Metrix principal components (including deep cohesion and connectivity) and Flesch Reading Ease. Group comparisons used t-tests with effect sizes reported as Cohen’s D (D>0.3 small, >0.5 medium, >0.8 large). Qualitative thematic analysis: A structured coding framework (Table 3 themes) covered impact types, articulation of significance and reach, linkage between research and impact, evidence quality, structure and style, and corroboration. Coding reliability was assessed on 10% of cases with >90% intercoder agreement. Thematic synthesis compared patterns in high- vs low-scoring cases, including examples of good/poor practice and corroboration quality. Additional measures: Counts of headings/subheadings were compared; word counts for Section 2 were examined via t-test to assess allocation of space to underpinning research.
Key Findings
Articulation of significance and reach: 84% of high-scoring cases clearly articulated benefits to specific groups with evidence of significance and reach, versus 32% of low-scoring cases, which more often emphasized pathways/engagement over actual realized benefits. High-scoring cases provided specific geographic or institutional references (e.g., in England and; in the US; the government's; to the House of Commons) whereas low-scoring cases more often used generic terms (e.g., international; policy and practice) without specificity. Attribution between research and impact: High-scoring cases used more attributional bundles (e.g., cited in; used to; resulting in; led by Professor …), and these more frequently attributed to impact. Analysis of 564 (high) and 601 (low) attributional phrase instances showed that 37% (n=210) in high-scoring cases established attribution to impact vs 18% (n=104) in low-scoring cases (p<0.0001; small effect, Cramer’s V≈0.22). Low-scoring cases more often attributed to research (40% vs 28%; p<0.0001; small effect). Both groups were similarly likely to attribute to pathways (~31–32%). Low-scoring texts used more ambiguous/uncertain phrases (e.g., a number of; an impact on) and often implied rather than evidenced causality. Corroborating evidence: 82% of high-scoring cases had generally high-quality corroboration versus 7% of low-scoring cases. Conversely, 71% of low-scoring cases had corroboration that was vague and/or poorly linked to claims (vs 11% of high-scoring). Among policy-impact cases, 42% of high-scoring (11/26) evidenced both policy change and implementation, compared to 17% of low-scoring (5/29). Readability and cohesion: High-scoring cases had higher Flesch Reading Ease overall (30.9 vs 27.5; p<0.01; D>0.4). By panel: Main Panel C (32.3 vs 27.4; p<0.001; D>0.5) and D (32.8 vs 28.3; p<0.05; D>0.3) showed significant differences; Panel A showed no significant difference but was lower overall. Coh-Metrix showed significantly higher deep cohesion and connectivity for high-scoring cases overall (both p<0.001; D>0.5). Panel A had higher deep cohesion than Panels C/D on average; within panels, high-scoring texts consistently had greater explicit logical and causal connectives. Structure and style: High-scoring cases more often used clear subheadings to identify individual impacts (Log Ratio 0.54 overall; especially pronounced in Panel D, Log Ratio 1.53). Qualitatively, high-scoring texts used clear, direct language, avoided filler phrases (e.g., in terms of; the way(s) in which; in relation to), minimized jargon/acronyms or explained them, and maintained coherent narratives linking research, pathways, and impacts. Low-scoring cases more often contained long sentences, academese, vague/unsubstantiated adjectives, and formatting that obscured the narrative. Underpinning research description: Low-scoring cases used more lexical bundles about research outputs/process (e.g., the paper; peer-reviewed; journal of; et al.; relationship between; research into; the research) rather than findings or quality, potentially crowding out space needed to demonstrate eligibility, findings relevance, and impact. No significant difference in Section 2 word counts (means 579 vs 537; p=0.11) suggested the issue was focus rather than length. Impact type patterns: Both high- and low-scoring cases spanned multiple impact types; high-scoring cases often presented specific, high-magnitude, and well-evidenced impacts, sometimes globally, and effectively contextualized limited geographic reach (e.g., hard-to-reach groups) to demonstrate significance.
Discussion
Findings across both analyses address the research questions by demonstrating that high-scoring REF2014 case studies consistently foregrounded the content of impact (specific, evidenced benefits and reach) and explicit causal attribution to impact, supported by high-quality corroboration. Linguistically, they formed a distinct genre: clearer, more direct prose, fewer uncertainty markers and filler phrases, and stronger cohesion/connectivity—features that likely enhanced comprehensibility for multidisciplinary REF panels. Low-scoring cases, by contrast, disproportionately emphasized processes (dissemination, engagement) and research outputs/processes, used generic or ambiguous phrasing, and often provided weaker or poorly linked corroboration. Readability and cohesion results further suggest that not only sentence/word length but also discourse-level explicitness (causal/logical links) differentiated higher-rated submissions. While causality between style and scoring cannot be claimed, the convergence of content, structure, and linguistic evidence implies that implicit stylistic norms may have influenced assessments alongside explicit criteria (significance, reach, attribution). These insights are practically significant for authors and institutions preparing impact case studies and conceptually relevant for the design of impact evaluation frameworks internationally.
Conclusion
The study offers the first large-scale empirical comparison of content and linguistic features in known high- versus low-scoring REF2014 impact case studies across Main Panels. High-scoring cases more often provided specific, high-magnitude, well-evidenced impacts, clear causal links from research to impact, and high-quality corroboration; they also used more attributional language and exhibited higher readability, deep cohesion, and logical connectivity. Low-scoring cases tended to emphasize pathways and research processes, used more vague/ambiguous language and filler phrases, and provided weaker corroboration. The findings suggest that implicit stylistic conventions—clarity, directness, explicit causality—may have contributed to scores, raising questions about how best to balance narrative style and evidence in impact assessment. Future research should test potential mechanisms linking research quality and impact realization, and further examine domains underrepresented here (e.g., pedagogy and public engagement) and evolving expectations in policy-impact assessment (including implementation evidence).
Limitations
The analyses are retrospective and based on a subset (<3%) of REF2014 case studies where scores could be inferred; thus findings may not generalize to all UoAs or panels. Causality between linguistic features and panel judgments cannot be established; multiple factors influence scoring. Expectations for REF2021 and beyond may have shifted (e.g., greater emphasis on implementation and longitudinal benefits). The sample did not permit robust conclusions about case studies primarily focused on public engagement or pedagogy. Some panel/UoA coverage was uneven, and Main Panel B had too few identifiable cases for separate statistical analysis.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny