Health and Fitness

Discretizing continuous variables in nutrition and obesity research: a practice that needs to be cut short

O. F. Morera, M. I. Dane'el, et al.

Explore the potentially risky practice of dichotomizing continuous variables in nutrition research, as revealed by Osvaldo F. Morera and colleagues. This study highlights significant distortions in effect sizes and insights lost in analyses when independent variables are simplistically categorized. Discover why multiple regression may offer more robust solutions for understanding complex nutrition-related dynamics.

00:00

Playback language: English

Index

Introduction

Studies in nutrition and obesity frequently utilize continuous variables like BMI, waist circumference, and eating indices, often dichotomized (e.g., median split) into "high" and "low" groups. This practice, while common (as evidenced by a large number of Google Scholar results and mentions in leading journals), is problematic. The paper highlights the prevalence of this practice in recent publications and offers examples using BMI categorization by the CDC, showcasing the arbitrary nature of these divisions. The authors present a critique of dichotomization and discretization of continuous independent variables, arguing that it leads to loss of information, reduced statistical power, biased parameter estimates, and the inability to detect nonlinear relationships. They propose multiple regression as a superior alternative, noting that this method requires only a basic understanding of statistical software commonly available to researchers. The introduction sets the stage for two empirical studies demonstrating the detrimental effects of dichotomization and discretization on research findings.

Literature Review

The paper reviews existing literature highlighting the drawbacks of dichotomizing continuous variables. It cites several leading journals and research articles that explicitly advise against this practice across various scientific fields. A significant portion of the literature review focuses on the issues arising from median splits, a common method of dichotomization, referencing specific studies that showcase the attenuation of effect sizes, reduction in measure reliability, and potential for spurious correlations resulting from this practice. The review also highlights the impact of dichotomization on statistical power, future meta-analyses, and the creation of biased parameter estimates. The authors build on existing knowledge by showing the context of the problem within the nutrition and obesity research fields, explicitly mentioning the implications for studies involving continuous health markers and the overall impact on the validity and reliability of research in this area.

Methodology

The study employs two cross-sectional studies to demonstrate the negative consequences of dichotomizing and discretizing continuous variables. **Study 1:** This study examined the relationship between health literacy, nutrition knowledge, and nutrition label accuracy in a sample of 612 participants (mostly female and Latinx). Health literacy and nutrition knowledge were measured using established questionnaires, and nutrition label accuracy was assessed using a modified version of the Nutrition Label Survey. Both health literacy and nutrition knowledge were dichotomized at arbitrary cut points (6 and 11, respectively). The analyses included a 2x2 ANOVA using the dichotomized variables, and multiple regression analysis using continuous variables, including a quadratic term for health literacy to account for nonlinear effects. **Study 2:** This study investigated the association between cognitive restraint, BMI, and fruit and vegetable intake in 586 participants (predominantly female and Latinx). Cognitive restraint was measured using a modified version of the Three-Factor Eating Questionnaire, BMI was calculated from height and weight measurements, and fruit and vegetable intake was assessed through skin carotenoid levels. BMI was dichotomized at the median and discretized according to CDC guidelines (and a modified version). Cognitive restraint was also dichotomized. Analyses included several ANOVAs using various dichotomized and discretized versions of BMI and cognitive restraint, and multiple regression analysis utilizing mean-centered continuous variables and their interaction. In both studies, the authors compared the results of analyses using dichotomized/discretized variables against those using continuous variables, highlighting discrepancies in effect sizes, significance of effects, and the ability to detect interactions and nonlinear relationships.

Key Findings

**Study 1:** Dichotomizing health literacy and nutrition knowledge resulted in a significant attenuation of effect sizes compared to the regression analysis with continuous variables. The ANOVA using dichotomized variables revealed main effects for both health literacy and nutrition knowledge on nutrition label accuracy, but failed to detect a significant interaction or the quadratic effect of health literacy. In contrast, multiple regression analysis showed significant main effects for both continuous predictors, a significant quadratic effect for health literacy (indicating a diminishing positive effect at higher levels), and a trend toward a significant interaction which was not significant in the model including a quadratic term. The multiple regression model explained substantially more variance in nutrition label accuracy than the ANOVA model. **Study 2:** The ANOVAs showed significant main effects for BMI on fruit and vegetable intake across all discretized BMI groupings. However, none of the ANOVAs detected a significant interaction between BMI and cognitive restraint. In contrast, multiple regression analysis with continuous, mean-centered variables revealed significant main effects for BMI and cognitive restraint, as well as a significant interaction. The multiple regression analysis demonstrated that the effect of cognitive restraint on fruit and vegetable intake was moderated by BMI, with the positive effect being only significant for lower BMI levels. This interaction was entirely missed by all ANOVA models. The regression model revealed a more complex and nuanced relationship that was obscured by the dichotomization and discretization approaches used in the ANOVAs.

Discussion

The findings from both studies clearly demonstrate the drawbacks of dichotomizing and discretizing continuous variables. The significant discrepancies between the ANOVA and regression results highlight the loss of information and the distortion of effect sizes that occur when continuous data are artificially categorized. The inability to detect the quadratic effect of health literacy in Study 1 and the interaction between BMI and cognitive restraint in Study 2 underscore the limitations of ANOVA when dealing with complex relationships. The authors argue that the use of multiple regression offers a superior approach, allowing for the detection of non-linear effects and interactions that would otherwise remain hidden. The discussion emphasizes that relying on dichotomization can lead to misleading conclusions, hindering the advancement of knowledge in the field. The authors recommend the use of more sophisticated techniques like latent class analysis for identifying underlying groupings when appropriate.

Conclusion

The paper concludes that dichotomizing and discretizing continuous independent variables in nutrition and obesity research should be avoided. The authors' two empirical studies clearly demonstrate the drawbacks of this practice, leading to distorted effect sizes, missed interactions, and an inability to detect nonlinear relationships. Multiple regression is advocated as a more appropriate analytical technique, offering a more accurate and nuanced understanding of the relationships under investigation. Researchers are encouraged to utilize readily available statistical tools and resources to interpret interactions and model non-linear effects properly. Future research should focus on promoting best practices in data analysis to improve the reliability and validity of findings in the field.

Limitations

The studies presented are cross-sectional, limiting causal inferences. The samples, while large enough for sufficient power in the specific models tested, are predominantly female and Latinx, which may limit the generalizability of the findings to other populations. While the authors acknowledge the possibility of underlying groupings in continuous variables, they do not explicitly test for these groups using latent variable methods, such as latent class analysis. The choice of cut-off points for dichotomization in both studies was relatively arbitrary, potentially influencing the obtained results. The analyses presented may not be directly applicable to all types of continuous variables and research designs in the field.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Parent and staff focus groups to address NICU racial inequities: "There's radical optimism in that we're in a different time and we're not doing it alone"

K. L. Karvonen, O. Smith, et al.

Earth Sciences

How long and how strong must a climatic anomaly be in order to evoke a social transformation? Historical and contemporaneous case studies

T. Ulus and R. Ellenblum

Business

The ongoing contributions of spin-off research and practice to understanding corporate restructuring and wealth creation: $100 billion in 1 decade

J. E. Owers and B. S. Sergi

Education

World-wide barriers and enablers to achieving evidence-informed practice in education: what can be learnt from Spain, England, the United States, and Germany?

J. R. Malin, C. Brown, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny