Veterinary Science

The identification of effective welfare indicators for laboratory-housed macaques using a Delphi consultation process

M. A. Truelove, J. E. Martin, et al.

This study presents a novel approach to identifying effective welfare indicators for laboratory-housed macaques using a Delphi consultation process. Led by researchers Melissa A. Truelove and colleagues, findings reveal that environment-based measures were significantly more validated compared to animal-based measures. Self-harm behaviors and social enrichment emerged as critical indicators, emphasizing the balance needed for accurate welfare monitoring.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses the lack of consensus on what welfare indicators should be used to assess laboratory-housed macaques. Although non-human primates comprise a small proportion of animal research, they are widely used and central to key biomedical studies, making accurate welfare assessment critical for both ethical reasons and scientific validity. Existing assessments often focus on environment-based inputs driven by legislation and accreditation, whereas robust welfare evaluation requires quantifiable animal-based outputs reflecting behaviour, physiology, and health. Building on frameworks used in farm animal welfare and previous Delphi applications in other species, the aim was to use a Delphi consultation to identify and determine the relative value (validity, reliability, feasibility) of potential animal- and environment-based indicators for macaque welfare and to rank their importance for on-site assessment.

Literature Review

Background literature highlights that welfare assessments historically emphasize environmental inputs due to objectivity and ease of measurement, but animal-based outcomes better reflect the individual’s welfare state. Modern farm animal welfare assessment (e.g., Welfare Quality protocols) integrates both input and output measures and conceptualizes welfare as a multidimensional continuum. Previous Delphi consultations have identified indicators for species such as dairy cows, laying hens, pigs, broilers, elephants, dogs, and laboratory mice. For macaques, prior efforts have not collectively evaluated indicators for validity, reliability, and feasibility. The literature underscores the potential negative effects of poor welfare on data validity and the importance of enrichment, especially social housing, in improving welfare and experimental reproducibility. It also notes challenges with validating behavioural indicators commonly used as welfare proxies in NHPs.

Methodology

Design: A modified two-round Delphi consultation. Ethical approval was obtained from the University of Edinburgh Human Ethics Research Committee (HERC_157_17); responses were quasi-anonymous and GDPR-compliant. Indicator identification: Web of Science searches (Jan 1965–Aug 2017, English) generated 709 unique records using keywords related to macaque welfare, health, wellbeing, alopecia, and quality of life (apes excluded). From this, 115 potential indicators were compiled: 61 environment-based (inputs: enrichment, housing/environment, health/management practices) and 54 animal-based (outputs: appearance/health, behaviour, physiology/genetics). Initial lists are detailed in Tables 7–8. Panel formation: Purposive and snowball sampling identified 477 experts (veterinary medicine, behavioural management/animal welfare, animal care/husbandry, facility management, research). Inclusion: ≥18 years old, ≥1 year experience with macaques. Round One (BOS, Jan 17–Feb 7, 2018): 114 responses (111 eligible) from eight countries. Participants rated each of the 115 indicators for validity, reliability, and feasibility (agree/disagree/undecided) under a standardized half-day welfare audit scenario for ~500 macaques in 25 rooms, with mixed housing and some active research involvement. Participants also selected and ranked the ten most important animal-based and ten most important environment-based indicators (no guidance on criteria). Two survey versions counterbalanced item order. Pilot testing (n=12) ensured face and content validity. Round Two (Excel, Feb 18–Mar 11, 2018): Personalized surveys sent to the 111 Round One respondents; 39 completed. Each included the participant’s Round One responses and group percentage agreement for controlled feedback. Participants could revise validity/reliability/feasibility ratings and re-rank the top ten animal and environment indicators. Statistics: Ordinal categorical responses were dichotomized (agree vs disagree/undecided). GLMMs with binomial distribution were used, including respondent as random effect; fixed effects included respondent, round, indicator, indicator type, and response type as relevant. Krippendorff’s alpha assessed group stability across rounds. Kendall’s W tested agreement on rankings. Analyses used SPSS v22, GenStat 19th ed., and Excel 2016; significance set at P<0.05. Composite scores were calculated as the mean of validity, reliability, and feasibility agreement percentages for each indicator.

Key Findings

- Overall consensus across validity, reliability, and feasibility was 67.5% (233/345 items across 115 indicators × 3 criteria). By criterion: validity 73% (84/115), reliability 63% (72/115), feasibility 67% (77/115). Animal-based indicators reached 63% agreement vs 86% for environment-based indicators. - Indicators meeting ≥70% agreement for all three criteria: 56/115 (49%), comprising 12/54 animal-based and 44/61 environment-based. - Proportion of indicators judged valid, reliable, and feasible: environment-based 72% (44/61) vs animal-based 22% (12/54). - Top-ranked indicators for assessing welfare: self-harm behaviours (animal-based) and social enrichment (environment-based). Other high composite-scoring environment-based indicators included health monitoring, humane euthanasia program, food and physical enrichment, ventilation, behavioural management program, hearing other NHPs, cage furniture, and room temperature. High-scoring animal-based indicators included self-harm behaviours, stereotypical behaviours, dyspnoea, blood in waste, huddled posture, appetite, NHP-induced injuries, body weight, and discharge. - Stability and agreement: GLMM showed significant effects of respondent (F1,26906=4.71, p=0.030), round (F1,26906=10.22, p=0.001), and indicator (F1,26906=286.54, p<0.001). Krippendorff’s alpha indicated high disagreement overall but slight movement toward agreement between rounds (α: 0.1947→0.1358; Δ=0.0589 toward agreement). Kendall’s W for Round Two ranking of top 20 indicators was good (W=0.703, P<0.001). - Ranking changes: In Round Two, 5/10 animal-based and 9/10 environment-based top indicators from Round One remained valid, reliable, and feasible. Some animal-based candidates (e.g., anxiety behaviours, body condition score, affiliative behaviours, species-typical behaviour at abnormal levels, activity level) dropped due to lower reliability/feasibility or validity. - Rejected indicators: Acute phase proteins and telomere length were judged less valid, reliable, or feasible for on-site assessment within a half-day visit.

Discussion

The study achieved substantial though slightly sub-threshold consensus (67.5%) on which indicators should be used to assess macaque welfare based on validity, reliability, and feasibility. Findings indicate experts consider environment-based measures more suitable for on-site assessments due to greater feasibility and inter-rater reliability, even though animal-based measures are often more directly reflective of individual welfare states. Behavioural indicators like abnormal repetitive behaviours and self-harm are commonly used proxies for welfare but require stronger empirical validation and careful operational definitions to enhance reliability, especially for observer-based ratings susceptible to bias. The results provide a benchmark set of indicators—particularly the 56 that met ≥70% agreement—to guide validation and integration into assessment tools. The convergence on self-harm behaviours and social enrichment as most important underscores the critical role of social housing and the need to prevent and treat self-injury in macaques. The work supports a combined input-output approach, recognizing that while environment-based measures are practical for large colonies and brief audits, animal-based outcomes are necessary to capture individual welfare states. The study also highlights the need for standardized sampling strategies, refined scoring systems, and training to improve reliability, especially for behavioural and health observations.

Conclusion

This Delphi consultation identified and prioritized animal- and environment-based welfare indicators for laboratory-housed macaques, delivering an empirically informed shortlist of measures considered valid, reliable, and feasible for on-site assessment. The highest-priority indicators were self-harm behaviours and provision of social enrichment. The findings emphasize incorporating both environment-based inputs and animal-based outputs in comprehensive welfare tools. Future work should empirically validate the identified indicators across settings (laboratories, zoos, sanctuaries), develop standardized sampling and scoring protocols to enhance reliability, and build integrated welfare assessment tools that include both negative and positive welfare measures.

Limitations

- Overall consensus fell slightly short of the predefined 70% threshold (achieved 67.5%), and Krippendorff’s alpha values indicated substantial disagreement despite movement toward agreement. - Only two Delphi rounds were conducted; Round Two had reduced participation (n=39), potentially influencing ranking stability and consensus. - Panel demographics were skewed: 90% North American; many participants worked in behavioural management or animal welfare, which may bias indicator preferences. - Assessments were based on expert opinion rather than empirical validation; many animal-based indicators still require formal validation. - The hypothetical on-site audit scenario did not specify optimal sampling sizes or observation periods, which can affect perceived feasibility and reliability of animal-based measures. - Observer ratings are subject to bias and variability; reliability concerns were evident, particularly for behavioural measures.

Related Publications

Explore these studies to deepen your understanding of the subject.

Humanities

A mathematical model for the process of accumulation of scientific knowledge in the early modern period

M. Zamani, H. El-hajj, et al.

Psychology

Factors associated with the outcomes of a novel virtual reality therapy for military veterans with PTSD: Theory development using a mixed methods analysis

B. Hannigan, R. V. Deursen, et al.

Physics

Combination of searches for Higgs boson decays into a photon and a massless dark photon using pp collisions at √s = 13 TeV with the ATLAS detector

T. A. Collaboration

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny