Medicine and Health

Measuring what matters in healthcare: a practical guide to psychometric principles and instrument development

K. Swan, R. Speyer, et al.

Discover the exciting world of psychometrics with insights from a ten-step process in developing and validating measurement instruments! This manuscript, authored by Katina Swan, Renee Speyer, Martina Scharitzer, Daniele Farneti, Ted Brown, Virginie Woisard, Reinie Cordier, Alessandro Giuliani, and Franca Crippa, highlights the essential role of sound measurement in healthcare, showcasing practical applications and addressing key challenges.... show more

Introduction

The manuscript addresses the central role of measurement in clinical practice and evidence-based healthcare. It defines measurement as the systematic categorization of attributes to create shared standards and emphasizes that robust instrument development and validation underpin sound clinical decision-making. It identifies a gap in psychometric training among many healthcare clinicians and sets the purpose of the paper: to present key psychometric concepts, pathways, and applied knowledge in an accessible manner for clinicians, educators, and researchers. The paper introduces latent variables and hypothetical constructs common in healthcare, differentiates objective measures from underlying constructs, and highlights risks of flawed measurement, including biased clinical and policy decisions.

Literature Review

The paper contextualizes psychometric practice within healthcare by referencing the COSMIN initiative and several empirical reviews that highlight widespread shortcomings in measurement instrument quality across disciplines. It cites examples such as Marshall et al. (2000), where randomized controlled trials using unpublished instruments were 40% more likely to claim treatment effects; and multiple systematic reviews showing limited or poor evidence for validity, reliability, and responsiveness of commonly used instruments (e.g., in multiple sclerosis and narcolepsy). It also describes a prior psychometric review that identified 39 visuoperceptual instruments for swallowing assessment across 45 studies, none with adequate properties for clinical or research use, motivating new instrument development.

Methodology

This is a tutorial and methodological guide rather than a primary empirical study. It introduces core psychometric theories (Classical Test Theory and Item Response Theory), key measurement properties (validity, reliability, responsiveness) per the COSMIN taxonomy, and preferred analytical approaches. It outlines a ten-step instrument development process, illustrated with the Visuoperceptual Measure for Videofluoroscopic swallow studies (VMV): 1) Identify existing instruments and conduct a systematic, critical review; 2) Retrieve data on psychometric properties for existing instruments; 3) Compare retrieved data against predefined quality criteria (COSMIN); 4) Define the construct(s), target population, end-users, and purpose (diagnostic, prognostic, evaluative); 5) Generate an item pool via deductive (literature, existing instruments) and inductive (stakeholder input, e-Delphi, qualitative methods) methods, aiming for comprehensive content coverage and acceptable redundancy; 6) Develop response scales aligned to item type (dichotomous, ordinal, interval), limiting categories to ≤7 and ensuring clarity and appropriateness for the population; 7) Expert review and cognitive interviewing to establish face validity and refine items; 8) Pilot testing with a small sample to evaluate comprehensibility, relevance, acceptability, feasibility, and preliminary psychometric behavior; 9) Item reduction and revision guided by theory (reflective vs formative models), factor analysis, internal consistency, floor/ceiling effects, and user feedback; and 10) Trialing the revised instrument in larger samples to evaluate structural validity, reliability, measurement error, invariance, responsiveness, and finalize the instrument. Throughout, COSMIN guidelines for study design, risk of bias, and reporting are emphasized.

Key Findings

Psychometric rigor is essential for trustworthy healthcare measurement; poor instruments can bias clinical trials and policy. In RCTs, use of unpublished instruments increased treatment effect claims by 40% (Marshall et al., 2000), with one-third of non-pharmacological superiority claims potentially negated by using published scales.
The COSMIN taxonomy organizes measurement properties into reliability, validity (content, structural, cross-cultural, criterion, hypotheses testing), and responsiveness, offering preferred analyses and quality criteria for healthcare instruments.
Differences between CTT and IRT are clarified: CTT evaluates total scores and assumes error properties that can inflate reliability with redundant items; IRT evaluates item-level performance, supports interval scaling, can handle incomplete data, but often requires large samples.
A practical ten-step process for instrument development and validation is provided, from landscape review to final trials, with the VMV as an applied example.
In the VMV precursor review, 39 instruments across 45 studies were identified for VFSS/FEES visuoperceptual measures; none met adequate psychometric standards for clinical or research use, prompting new development.
Pilot and iterative refinement can substantially reduce item count (e.g., VMV reduced by ~50% from pilot to trial) while improving measurement quality.
Emphasis is placed on validity as the paramount property: reliable but invalid instruments are not clinically useful.

Discussion

The guide addresses the need for improved psychometric literacy and practice among healthcare professionals by translating COSMIN recommendations into an actionable, stepwise framework. It reinforces that accurate measurement of latent, often multidimensional constructs hinges on well-defined constructs, robust content validity, and appropriate analytic models. The discussion prioritizes validity over reliability in the hierarchy of properties, arguing that consistent measurement is inconsequential if it does not capture the intended construct. Using the VMV example demonstrates how to operationalize the framework, integrate expert and stakeholder input, and iterate through pilot-to-trial stages to optimize structural validity, reliability, and responsiveness. The significance lies in mitigating widespread misuse of weak instruments, improving clinical decision-making, research integrity, and allocation of healthcare resources.

Conclusion

The paper contributes a clear, practical roadmap for developing and validating healthcare measurement instruments grounded in COSMIN standards. It synthesizes psychometric theory (CTT, IRT), clarifies measurement properties and preferred analyses, and operationalizes a ten-step development pathway with an applied VMV example. It calls for broader adoption of psychometric best practices across healthcare disciplines to ensure instruments are valid, reliable, responsive, and feasible. Future work should include continued dissemination and training in COSMIN methodology, rigorous systematic reviews of instrument properties across specialties, and further empirical testing and cross-cultural validation of new and existing measures.

Limitations

The manuscript is a tutorial and narrative guide rather than a primary empirical study; it does not conduct new systematic reviews or full-scale psychometric validations beyond describing the VMV example and referencing prior reviews. Methodological specifics and comprehensive statistical procedures are summarized at a conceptual level, with detailed analyses deferred to COSMIN resources and supplemental materials. Generalizability of the VMV example is illustrative and may not directly transfer to all clinical contexts or constructs without adaptation and further validation.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Efficacy of early PET-CT directed switch to carboplatin and paclitaxel based definitive chemoradiotherapy in patients with oesophageal cancer who have a poor early response to induction cisplatin and capecitabine in the UK: a multi-centre randomised controlled phase II trial

S. Mukherjee, C. N. Hurt, et al.

Linguistics and Languages

A practical guide to calculating vocal tract length and scale-invariant formant patterns

A. Anikin, S. Barreda, et al.

Medicine and Health

A systematic framework for understanding the microbiome in human health and disease: from basic principles to clinical translation

Z. Ma, T. Zuo, et al.

Interdisciplinary Studies

Measuring museum sustainability in China: a DSR model-driven approach to empower sustainable development goals (SDGs)

S. Wang, L. Yu, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny