logo
ResearchBunny Logo
Statistical assessment of reliability of anthropometric measurements in the multi-site South African National Dietary Intake Survey 2022

Health and Fitness

Statistical assessment of reliability of anthropometric measurements in the multi-site South African National Dietary Intake Survey 2022

S. Nel, J. D. Man, et al.

This study by Sanja Nel, Jeroen de Man, Louise van den Berg, and Friedeburg Anna Maria Wenhold evaluates the reliability of various anthropometric measurements in a large-scale dietary survey in South Africa. Discover how training and standardized protocols improve data consistency, ultimately emphasizing the need for accuracy in nutrition research.... show more
Introduction

Accurate and reliable anthropometric measurement is challenging in large, multi-site surveys, and poor data quality can bias estimates of nutritional status and obscure trends across studies and over time. Reliability—repeatability of measurements within and between measurers—can be affected by equipment, protocol standardisation, participant characteristics, and measurer technique. Prior national surveys in South Africa assumed reliability post-training without documenting it, and most pre-survey reliability evidence comes from high-income countries, leaving a gap for sub-Saharan Africa. This study addresses that gap by assessing intra- and inter-rater reliability of key anthropometric measures (weight, length/height, MUAC, WC, CC) among site leads and fieldworkers preparing for the NDIS-2022 across all targeted age groups.

Literature Review

Background literature highlights substantial variability in anthropometric data quality within and between countries in Demographic and Health Surveys, with potential impacts on malnutrition prevalence estimates. Prior work identifies multiple sources of measurement error (equipment, site identification for WC, technique for infant length), emphasizing the need for standardised protocols and training. Published reliability studies largely originate from Europe and North America across age groups, consistently showing higher reliability for weight and height than for circumferences, and higher intra- than inter-rater reliability. There is limited published evidence from low- and middle-income countries, and little on calf circumference reliability outside NHANES references, underscoring the need for context-specific reliability assessment in South Africa.

Methodology

Design and setting: Preparatory standardisation and reliability study for the multi-site South African National Dietary Intake Survey 2022 (NDIS-2022), decentralised to 12 teams across nine provinces (each led by a site lead and coordinator). Training: A 12‑module training programme and standardised measurement protocols were developed based on WHO child anthropometry guidelines, DHS best-practice guidance, and the FANTA Guide to Anthropometry. Training materials (manual and presentations) were publicly available. Site lead training: Conducted centrally over two days (January 2022) for 12 site leads and coordinators by study authors. Techniques were demonstrated and practised on infants, children, and adults using newly procured, identical equipment across sites. Reliability assessment for site leads used repeated measurements on volunteers (3 infants 7–14 months, 3–4 children 3–4 years, and 3 adults). Two rounds of measurements were taken on the same volunteers, with forms submitted after the first round to reduce recall bias. Minor protocol adjustments were made after feedback (e.g., minimum volunteer numbers/attributes, same-sex WC measurer). Fieldworker training: Site leads and coordinators delivered provincial trainings (February 2022) across eight sessions, training 46 two-person anthropometry teams (lead measurer and assistant). Standardisation used volunteers representing ages 0–1, 1–5, and ≥12 years (including overweight/obese adults). Two measurement rounds were performed on the same volunteers. One site required retraining after preliminary analyses; only post-retraining data were included. Data and analysis: Data were captured/cleaned in Excel and analysed in R. Intra-rater reliability used both rounds; inter-rater used only the first round per measurer. Volunteers measured by only one fieldworker were excluded from inter-rater analyses. Reliability metrics computed for each parameter (weight, length/height, MUAC, WC, CC), by measurer group (site leads, fieldworkers) and age group (0–<2 y, 2–12 y, >12 y) where relevant: TEM (absolute error), %TEM (relative error), coefficient of reliability (R), and intraclass correlation coefficient (ICC) using one-way random-effects, single-measure models (irr and irrNA packages). Lower TEM/%TEM and higher ICC/R indicate better reliability. Differences in TEM between site leads and fieldworkers were compared using F-statistics (F = TEM_site leads^2 / TEM_fieldworkers^2) with N−1 df. Bland-Altman plots (blandr package) visualised agreement; for intra-rater, differences between repeated measures vs their mean; for inter-rater, difference between a measurer’s first value and the mean of all measurements vs the overall mean. Bias (mean difference) with 95% CI and limits of agreement (±2 SD) were derived. Ethics: Umbrella ethical approval (University of the Western Cape: BM21/4/12); informed consent/assent obtained. Sample: Reliability assessments included 15 volunteers for site leads and 75 for fieldworkers; each volunteer measured by 2–11 measurers (median 4).

Key Findings
  • Relative reliability by %TEM was best for weight (0.260–0.923%) and length/height (0.434–0.855%), and poorest for MUAC among fieldworkers (2.592–3.199%) and for WC (2.353–2.945%). Fieldworkers’ %TEM was consistently highest (least reliable) in the 0–<2 years group. - Site leads vs fieldworkers: Site leads had significantly lower TEM (better reliability) for weight (inter-rater TEM 0.097 vs 0.291 kg, p=0.012; intra-rater TEM 0.075 vs 0.215 kg, p<0.001), for length/height intra-rater (0.478 vs 0.851 cm, p=0.009), and for CC intra-rater (0.363 vs 0.793 cm, p<0.001). Fieldworkers performed better for WC inter-rater (TEM 1.745 cm vs site leads 2.364 cm, p=0.014). - ICC and R: In whole-sample analyses, ICC and R exceeded 0.90 for all parameters except site leads’ CC inter-rater (ICC 0.896; R 0.889). By age group, fieldworkers’ inter-rater MUAC in children <2 years also fell below 0.90 (ICC 0.851; R 0.881). - Bland-Altman: No significant bias detected except for fieldworkers’ intra-rater length/height in adolescents/adults, with small positive bias (overall 0.194 cm [95% CI 0.058, 0.330], attributable to +0.220 cm [0.042, 0.400] in >12 years). Over 90% of observations lay within limits of agreement across plots. - Overall patterns: Reliability was higher for site leads than fieldworkers, higher for intra- than inter-rater comparisons, and higher for weight and length/height than for circumference measures (MUAC, WC, CC).
Discussion

The study directly addresses the need to document measurement reliability before a large, multi-site nutrition survey. Findings show excellent reliability overall (ICC and R typically >0.90), supporting the adequacy of the standardised training and protocols. Consistent with prior literature, intra-rater reliability exceeded inter-rater reliability, and weight and length/height were more reliable than circumference measures. Circumference measures—especially MUAC and WC—showed greater variability, reflecting their technical demands and, for CC, the novelty to measurers. Site leads outperformed fieldworkers for most measures, likely reflecting prior training and engagement, though fieldworkers performed better for WC inter-rater measurements, underscoring the value of rigorous standardisation even for experienced staff due to differences in WC site definitions across institutions. Bland-Altman analyses indicated minimal bias, with a small positive intra-rater bias for height in older participants that is unlikely to be clinically meaningful for population surveillance. The results validate the capability of trained teams to collect reliable anthropometric data for the NDIS-2022 and highlight areas (infant measurements, circumferences) where additional training and ongoing quality checks are beneficial.

Conclusion

This is the first published pre-survey reliability assessment for anthropometric measurements in a large-scale, multicentre South African nutrition survey. Using standardised protocols, uniform equipment, and a structured training programme, both site leads and fieldworkers achieved acceptable to excellent reliability across weight, length/height, MUAC, WC, and CC. Nonetheless, circumference measures and measurements in very young children were less reliable, emphasizing the need for intensified, hands-on training and continuous reliability monitoring during data collection. Future work should incorporate pre- and post-training assessments to quantify training effectiveness, routine in-field reliability checks, and transparent reporting of anthropometric data quality alongside main survey results to enhance comparability over time and across settings.

Limitations
  • Accuracy (validity against a gold standard) was not assessed; highly trained, accredited anthropometrists to provide gold-standard measurements were unavailable. - Although equipment was newly purchased and verified daily, systematic technique-related errors cannot be excluded. - Some provinces and age groups had limited numbers of trainees and volunteers, increasing statistical variability. - Pooling data across sites enabled analysis but may have masked inter-site differences. - Two days of central training may be insufficient for inexperienced fieldworkers, indicating a need for more hands-on practice time.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny