logo
ResearchBunny Logo
Reliability of body composition assessment using A-mode ultrasound in a heterogeneous sample

Health and Fitness

Reliability of body composition assessment using A-mode ultrasound in a heterogeneous sample

M. Miclos-balica, P. Muntean, et al.

Discover how A-mode ultrasound is revolutionizing body fat percentage estimates in a diverse group of healthy adults. This research reveals high reliability and precision, especially in men, demonstrating the impact of examiner performance. Conducted by esteemed authors including Monica Miclos-Balica and Paul Muntean.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the need for reliable body composition assessment methods that are practical and accessible in clinical and sports settings. While laboratory techniques (DXA, MRI, UWW, ADP) are accurate, they require costly equipment and space. Field methods (anthropometry, BIA, ultrasound) are portable and less expensive but must be validated and their reliability characterized across populations. Prior evidence suggests US can effectively assess body composition, yet findings on validity are mixed and reliability has been less explored, often in small, homogeneous samples and with limited prediction formulas. The research question is whether A-mode ultrasound provides reliable %BF estimates in a heterogeneous adult sample, how reliability varies by prediction formula, and whether reliability differs by gender.
Literature Review
Multiple studies have examined the validity of ultrasound for body composition with mixed outcomes. US combined with anthropometry agreed well with DXA but showed bias versus ADP and BIA in 89 adults. US %BF assessments were accurate in 93 athletes, but other works using different devices/formulas found significant differences versus DXA. In 70 high school wrestlers, FFM by US did not differ from UWW with negligible Bland–Altman bias. Cross-validation studies reported high correlations between US and BIA (r ≈ 0.86) and US and ADP (r ≈ 0.87). Compared with a three-compartment model, US underestimated %BF by 4.7% and overestimated FFM by 4.4 kg in overweight/obese subjects; in elite athletes, US overestimated %BF by about 3% compared with ADP. Population-specific equations (e.g., Brazilian adults) enabled near-zero bias versus ADP, and a study in normal-weight adults found no bias between US and ADP. Accuracy of subcutaneous adipose tissue thickness measurement has been validated on excised tissues and cadavers with <1 mm error for both A-mode and B-mode. Reliability studies exist but typically on small, homogeneous samples and few formulas, motivating the present comprehensive reliability evaluation with multiple formulas and gender analysis.
Methodology
Design: Reliability study assessing intra- and intertester reliability of A-mode ultrasound-derived %BF using four prediction formulas. Participants: 144 clinically healthy adults (81 men, 63 women), aged 18–70 years (mean (SD) 30.4 (10.1) y), BMI 24.6 (4.7) kg/m². Recruitment via social networks and community flyers; conducted per the Declaration of Helsinki with ethics approval and informed consent. Anthropometrics: Body mass measured to 0.01 kg with a calibrated scale integrated with a BOD POD system; height to 1 mm with a wall-mounted tape measure. Ultrasound device and protocol: BodyMetrix BX2000 (A-mode, 2.5 MHz). Subcutaneous adipose tissue thickness measured at 8 anatomical sites: biceps, triceps, chest, scapula, axilla, waist, hip, thigh. BodyView software (v5.7.11043) used with new client profiles including demographics and ‘Athletic’ vs ‘Non-Athletic’ designation (BMI <25 as Athletic; no elite athletes). For precision reflective of routine assessments, the automatic algorithm in BodyView was used to identify fat–muscle interface. A small amount of gel was applied, the transducer was placed on the skin and slid ~0.5 cm above/below the site for 4–8 s with slight steady pressure to avoid tissue deformation, enabling local signal averaging. Measurement schedule and testers: Two testers (≈1 year experience) each performed triplicate measurements. For each subject, a coin flip determined which tester measured first while the other recorded; roles alternated until each tester had three sets. Data entry for the two testers was separated to minimize cross-influence. Formulas: %BF computed using four BodyView formulas: JP7 (7-sites Jackson & Pollock), JP3 (3-sites Jackson & Pollock), P3 (3-sites Pollock), and BIC (1-point biceps). Each set began with BIC, then JP7 measurements; JP3 and P3 were computed by manually entering relevant thicknesses from JP7 into BodyView. Statistical analysis: MATLAB Statistics Toolbox. Significance P ≤ 0.05. Agreement assessed via Bland–Altman (bias and 95% limits of agreement; ULA−LLA width). Reliability metrics included ICC(2,1) as relative reliability; absolute indices: TEM, SEM (SD/√(1−ICC)), and MDC (1.96√2·SEM). Intratester reliability examined across consecutive trial pairs (1–2, 1–3, 2–3). Intertester reliability primarily compared T1 trial 1 vs T2 trial 3 to maximize temporal separation. Additional analysis: comparison of S7 (sum of 7 site thicknesses) between testers via paired t-test to probe intertester bias source.
Key Findings
- Reliability by formula: JP7 (most sites) yielded the best intratester reliability, followed by JP3 and P3, with BIC worst. - Intratester reliability (JP7): ICC = 0.979 (Tester 1), 0.985 (Tester 2); TEM ≈ 1.07% BF (T1), 0.89% BF (T2); SEM ≈ 1.06% (T1), 0.89% (T2); MDC ≈ 2.95% BF (T1), 2.47% BF (T2). - Intertester reliability (JP7): Bias = −0.5% BF; ICC = 0.972; TEM ≈ 1.24% BF; SEM ≈ 1.24%; MDC ≈ 3.43% BF overall. - Gender-specific intertester MDC (JP7): 3.24% BF (men), 3.65% BF (women). - Across formulas (examples): For JP3, intratester ICCs were 0.954 (T1) and 0.960 (T2); MDCs ≈ 4.21% (T1) and 3.92% (T2). - Learning effects: No consistent trend across trial pairs (1–2, 1–3, 2–3) for precision indices; learning effects absent. - Tester effect on measurements: S7 (sum of JP7 site thicknesses) was lower for T1 than T2 by −2.4 mm overall (95% CI [−3.3, −1.3] mm; P = 7.7×10⁻⁷); larger underestimation in women (−3.0 mm; 95% CI [−4.7, −1.3]; P = 5.4×10⁻⁷) than men (−1.8 mm; 95% CI [−3.0, −0.6]; P = 4.5×10⁻³). - Overall, reliability was higher in men than women, and Tester 2 generally showed slightly better reliability than Tester 1.
Discussion
The study demonstrates that A-mode ultrasound can provide highly reliable estimates of %BF in a heterogeneous adult sample when using multi-site formulas, particularly JP7. The findings address the research question by quantifying both relative reliability (ICC) and absolute reliability (TEM, SEM, MDC) for four commonly used formulas and by stratifying results by gender and tester. The superior performance of the 7-site formula indicates that leveraging more measurement sites reduces random error and improves repeatability. The absence of learning effects across repeated trials suggests that, after approximately one year of experience, examiner performance is stable over the measurement session. Intertester analysis revealed a small but significant bias attributable to systematic underestimation of subcutaneous thicknesses by one examiner, highlighting examiner technique as a meaningful source of variability. Reliability was consistently greater in men than women, which may reflect sex-related differences in subcutaneous fat distribution and measurement challenges. Collectively, these results support the use of A-mode US for monitoring %BF, provided that standardization and examiner training minimize intertester differences.
Conclusion
A-mode ultrasound is a highly reliable field method for %BF assessment, especially when employing the 7-site Jackson and Pollock formula. In this large, heterogeneous sample with more than 50 participants per gender, intratester ICCs were excellent and MDCs for JP7 were approximately 3% BF (≈2.6% men; ≈3.3% women), indicating suitability for tracking moderate changes in body composition. Reliability was higher in men than in women, and examiner performance contributed to systematic differences, underscoring the need for rigorous standardization and training. Future work should focus on refining examiner protocols, exploring device/software improvements (e.g., automated interface detection), and validating reliability across broader populations and clinical contexts.
Limitations
- Intertester variability and small systematic bias were observed, indicating examiner technique influences measurements. - The sample comprised healthy adults and excluded elite athletes, which may limit generalizability to athletic or clinical extremes. - Results are specific to one A-mode device (BodyMetrix BX2000) and the BodyView software’s automatic algorithm; findings may not directly generalize to other devices or manual interface detection. - Reliability was lower in women than men, suggesting potential sex-related measurement challenges that warrant further study.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny