Health and Fitness
Discretizing continuous variables in nutrition and obesity research: a practice that needs to be cut short
O. F. Morera, M. I. Dane'el, et al.
The paper addresses the widespread practice in nutrition and obesity research of dichotomizing or discretizing continuous variables (e.g., BMI, waist circumference) to facilitate analyses like ANOVA. The authors review the prevalence of median splits and argue that such practices lead to information loss, distorted effect sizes, and reduced power. The purpose is to demonstrate, using two nutrition-related cross-sectional studies, the negative consequences of dichotomization/discretization and to advocate for analyzing continuous variables with multiple regression, including probing interactions and nonlinear effects. This is important for improving validity, interpretability, and ethical reporting in nutrition and obesity science.
The authors summarize established methodological critiques of dichotomizing continuous variables. Key points include: (1) Effect size attenuation when dichotomizing normal predictors at or away from the mean due to information loss and added measurement error (Cohen, 1983). (2) Reduced reliability of measures and consequent attenuation of relationships (MacCallum et al., 2002). (3) Potential spurious increases in correlations due to sampling error, especially with small true correlations and small samples, occurring frequently even with n up to 300 (MacCallum et al., 2002). (4) Increased risk of spurious main effects or interactions when multiple predictors are dichotomized, particularly as inter-predictor correlations rise (Maxwell & Delaney, 1993). (5) Distorted effect sizes impair statistical power and can bias meta-analyses. (6) Dichotomization precludes modeling nonlinear relationships (e.g., quadratic effects). Leading journals across fields recommend against dichotomization, and alternatives include moderated regression with continuous variables and established tools for probing interactions (e.g., Johnson–Neyman).
Analytic approach: Two cross-sectional studies were analyzed using parallel approaches: (a) ANOVAs with dichotomized/discretized continuous predictors and (b) multiple regression treating predictors as continuous, including nonlinearity and interactions. Correlations among predictors and with outcomes were reported to contextualize potential attenuation/inflation from dichotomization. Nonlinearity was tested given potential for spurious interactions.
Study 1 (n=612): Examined health literacy and nutrition knowledge predicting nutrition label accuracy. Participants: 71.4% female; mean age 20.26 (SD=3.89); 85.3% Latinx; data collected online (Qualtrics) in 2017–2018; IRB-approved with consent. Measures: Health Literacy Skills Instrument (modified), KR-20=0.68, score range 0–9; dichotomized at ≤6 vs 7–9 for ANOVA. Nutrition knowledge (modified Parmenter & Wardle), KR-20=0.65, score range 0–18; dichotomized at ≤11 vs ≥12. Outcome: Nutrition Label Survey (modified), KR-20=0.77, score range 0–16. Power assumptions indicated adequate power to detect a quadratic effect; study not replicated. Analyses: (1) 2×2 ANOVA with dichotomized predictors and interaction. (2) Multiple regression models: linear terms and interaction; and model including quadratic term for health literacy (not centered because 0 was meaningful); interaction tested but not retained if nonsignificant. Johnson–Neyman technique used to probe the quadratic effect.
Study 2 (n=586): Examined cognitive restraint and BMI predicting fruit/vegetable (F/V) intake. Participants: 66.2% female; 69.3% Latinx; 53.6% income < $50K; mean age 35.5 (SD=14); data from health professionals, nutrition students, and community members (Mar 2018–Jun 2019); IRB-approved with consent. Measures: Cognitive restraint (Three-Factor Eating Questionnaire domain, modified), alpha=0.67, mean 2.61 (SD=0.56), range 1–4; dichotomized at ≤2.60 vs >2.60 for some ANOVAs. BMI measured via stadiometer and body composition analyzers; mean 27.99 (SD=6.05), range 17.0–60.7. BMI discretizations: (a) dichotomized at median 27.1; (b) CDC categories: underweight, healthy weight, overweight, Class 1/2/3 obesity; (c) modified CDC merging all obesity classes. Outcome: F/V intake via skin carotenoid levels (VEGGIE METER), range 29–709, mean 275.67 (SD=110.15). Power assumptions indicated adequate power to detect interaction; study not replicated. Analyses: (1) ANOVAs: 2×2 (cognitive restraint×BMI dichotomized), 2×4 (cognitive restraint×BMI via CDC four levels), 2×6 (cognitive restraint×BMI via six levels). (2) Multiple regression with continuous predictors and their interaction; predictors mean-centered (neither could be zero). Johnson–Neyman used to probe the interaction. Model fit (R²), conditional effects, and squared partial correlations were reported.
Study 1:
- Correlations: Health literacy vs label accuracy r=0.30 (8.8% variance); dichotomized health literacy vs label accuracy r=0.14 (2.0%), a 77.2% reduction in explained variance. Nutrition knowledge vs label accuracy r=0.29 (8.3%); dichotomized nutrition knowledge r=0.20 (4.1%). Predictors correlated r=0.30.
- ANOVA (dichotomized predictors): Nutrition knowledge main effect F(1,608)=14.27, P<0.001, partial r²=0.023, d=0.43; means 10.59±0.20 (low) vs 11.66±0.20 (high). Health literacy main effect F(1,608)=8.44, P=0.004, partial r²=0.014, d=0.31; means 10.72±0.16 (low) vs 11.53±0.23 (high). Interaction ns, F(1,608)=3.512, P=0.061; model R²=0.06.
- Multiple regression (with quadratic health literacy): R²=0.17. Nutrition knowledge slope B=0.17 per unit (P<0.001), squared partial r²=0.030. Health literacy simple effect at HL=0: B=2.52 (P<0.001), squared partial r²=0.07. Quadratic effect of health literacy B for HL²=-0.20 (P<0.001), squared partial r²=0.049, indicating diminishing positive effect with increasing HL. Johnson–Neyman probing of the quadratic effect: local linear effect of HL on accuracy is not significant for HL between 5.8883 and 7.2203; positive for HL 0–5; negative for HL 8–9.
Study 2:
- Correlations: BMI vs F/V intake r=-0.18 (3.1%); dichotomized BMI r=-0.21 (4.2%)—increase consistent with small true correlation; Z=1.07, P=0.29. Cognitive restraint vs F/V intake r=0.13; dichotomized cognitive restraint r=0.14; Z=-0.17, P=0.86. Cognitive restraint vs BMI r=0.04, P=0.28.
- ANOVAs: 2×2 (CR×BMI dichotomized): BMI main effect F(1,582)=24.58, P<0.001, partial r²=0.041; means 296.26±6.24 (low BMI) vs 252.42±6.26 (high BMI). Cognitive restraint main effect F(1,582)=12.07, P<0.001, partial r²=0.02; means 258.98±6.50 (low CR) vs 289.70±5.99 (high CR). Interaction ns, F(1,582)=3.34, P=0.062. R²=0.067. 2×4 (CR×CDC BMI): BMI main effect F(3,578)=6.94, P<0.001, partial r²=0.012; healthy weight > overweight and obesity via Bonferroni; CR main effect ns, F(1,578)=3.23, P=0.073; interaction ns, F(3,578)=2.07, P=0.103; R²=0.065. 2×6 (CR×six BMI levels): BMI main effect F(5,574)=4.85, P<0.001, partial r²=0.008; healthy weight > overweight and all obesity classes; CR main effect ns, F(1,574)=1.42, P=0.23; interaction ns, F(5,574)=1.72, P=0.128; R²=0.072.
- Multiple regression (mean-centered): R²=0.059. BMI conditional effect B=-3.75 (SE=0.76), t=-4.95, P<0.001, squared partial r²=0.040. Cognitive restraint conditional effect B=24.22 (SE=8.01), t=3.02, P=0.003, squared partial r²=0.015. Interaction B=-2.84 (SE=1.34), t=-2.12, P=0.034, squared partial r²=0.008. Johnson–Neyman probing: the positive effect of cognitive restraint on F/V intake is significant for BMI values up to 2.24 units above the mean BMI (i.e., BMI < 30.23 given mean 27.99); no significant association for BMI > 30.23. Treating CR as moderator, the simple slope for BMI is significant when cognitive restraint is ≥ 0.64 units below its mean (i.e., CR ≥ mean−0.64).
The findings directly address the research question by empirically demonstrating that dichotomizing/discretizing continuous predictors in nutrition research distorts relationships and obscures important effects. In Study 1, dichotomization markedly attenuated correlations and ANOVA explained far less variance than regression. Critically, ANOVA could not detect a meaningful quadratic health literacy effect that regression identified and probed. In Study 2, ANOVAs with various BMI discretizations failed to detect the moderating effect of BMI on the relation between cognitive restraint and F/V intake that was evident in moderated regression. These results underscore how dichotomization can yield misleading main effects, reduce power, and preclude examination of nonlinearity and nuanced interactions. Employing multiple regression with continuous variables, mean-centering when appropriate, and probing via Johnson–Neyman provides more accurate, generalizable, and interpretable insights. The work highlights methodological rigor as an ethical imperative in applied nutrition and obesity research, given the implications for cumulative science and policy.
Across two nutrition-related cross-sectional studies, the authors show that dichotomizing or discretizing continuous independent variables attenuates or distorts effect sizes, reduces model fit, and can mask nonlinear and moderating effects. They recommend retaining continuous predictors and using multiple regression (including quadratic terms and interaction probing) to model relationships accurately. When true categorical groupings are theoretically expected, methods such as latent class analysis or taxometric approaches should be used instead of arbitrary splits; if cut points are used, they should be established (not sample-specific). Readily available tools (e.g., PROCESS, quantpsy.org) enable probing interactions via Johnson–Neyman without dichotomization. Future work should continue promoting best practices, apply these methods across diverse nutrition outcomes, and educate researchers/reviewers to discourage dichotomization.
Both studies were cross-sectional and were not replicated (explicitly noted for Study 1 and Study 2), which may limit generalizability. Samples were predominantly female and Latinx, potentially affecting external validity. The demonstrations focus on analytic contrasts rather than comprehensive modeling of all covariates, and results illustrate consequences of dichotomization/discretization rather than establishing causal effects.
Related Publications
Explore these studies to deepen your understanding of the subject.

