
Food Science and Technology
Forecasting food trends using demographic pyramid, generational differentiation and SuperLearner
D. Loginova and S. Mann
This innovative study predicts food consumption patterns across social groups until 2050 using extensive Swiss household data. With insights from Daria Loginova and Stefan Mann, discover how generational changes impact our dining tables in the future.
Playback language: English
Introduction
Accurate food demand forecasting is crucial for managing food stocks, logistics, production, supply chains, disease risks, and environmental impact. Existing studies highlight the importance of generational factors in predicting food trends. This research builds upon this by developing and applying methods to predict food consumption patterns for future decades and various social groups, incorporating generational shifts. Leveraging extensive Swiss household consumption data (1990-2017) and demographic projections from the Federal Statistical Office, the study extends existing consumption research methodologies. Five forecasting techniques are employed:
1. **Model A (Reference Scenario):** Annual trend extrapolation.
2. **Model B:** Extrapolation of trends by age and gender, incorporating the demographic pyramid.
3. **Model C:** Forecasting using generational trends.
4. **Model D:** General Linear Model (GLM) prediction with additional factors.
5. **Model E (SuperLearner):** A weighted combination of logit model, XGBoosting, and random forest, applied to normalized consumption data, aiming for enhanced explanatory power.
The paper proceeds with a literature review, data description, detailed explanations of the five methods, results presentation and discussion, and concluding remarks.
Literature Review
Previous approaches to food demand forecasting range from early methods like partial equilibrium modeling, extrapolation, and linear regression to more recent machine learning techniques such as decision trees, neural networks, and boosting. Reviews suggest that the complexity of the model doesn't always consistently affect the accuracy of future global kilocalorie demand estimates. This study acknowledges that food preferences are influenced not only by age, income, and gender, but also by generational shifts and evolving social groups. Switzerland, where consumption patterns are primarily demand-driven rather than supply-constrained, serves as a suitable case study.
Methodology
The study utilizes household-level data from the Swiss Federal Statistical Office (1990–2017), encompassing up to 12,000 households per year and providing detailed information on household characteristics and monthly food purchases for 75 food categories. Data preprocessing involved outlier handling (excluding the top and bottom 1% of households per food item based on consumption per person), consideration of social characteristics (age, gender, income, region, and generation—defined with 10-year birth cohorts), and handling of mixed-generation households (excluded from generational analyses). The data were then processed differently depending on the model used.
For Models A-C, a transformation to growth rates addressed the issues of non-stationarity, linearity assumptions and potential for unrealistic projections (negative or extremely high values). Models D and E utilized range normalization to facilitate comparisons across food items and models.
**Model A:** Simple linear regression of consumption growth on time.
**Model B (Linear extrapolation by age and gender):** Linear regression of consumption growth on time, age, and gender. Extrapolated trends are weighted by age and gender distributions from the demographic pyramid.
**Model C (Prognosing by generational trends):** Linear regression of consumption growth on time and generation. A method to handle unborn generations is introduced. Extrapolated trends are weighted by generational shares in the population.
**Model D (GLM):** A GLM model incorporates age, gender, generation, income, and region as predictors for normalized consumption growth. The model is trained on data from 1990-2017 and applied to the 2018-2050 period.
**Model E (SuperLearner):** This ensemble method combines predictions from GLM, XGBoost, and random forest models applied to binarized (above or below median consumption) and normalized consumption data. The models' weights are optimized to maximize the ROC curve.
Additional calculations involved determining social group sizes and shares in the population using linear extrapolation, allocating the population to generations, and considering the impact of mixed-generation households. The transformation back to consumption levels was performed before weighting the results according to population shares.
Key Findings
The study projects food consumption until 2050 using five different models. Models incorporating generational factors (Models C, D, and E) demonstrate the potential to predict shifts in consumption dynamics. These models often result in convex or concave projection lines due to the weighting by generational population shares. Simple extrapolation models (Model A) show constant trends, indicating no significant change in consumption growth rates.
Table 2 summarizes the results for all 75 food items, showing average annual projected consumption changes between 2020 and 2050 for Models B, C, D, and E. The SuperLearner (Model E) significantly improved the forecasting of several food items, avoiding implausible zero or negative consumption predictions compared to the GLM (Model D). For instance, it accurately projected consumption for items like canned meat, cream, jam, and milk where the GLM failed. The generational approach (Model C) generally provided more plausible forecasts than the age and gender approach (Model B), especially for items where consumption isn't expected to reach zero.
The dynamics of forecasts varied across food items. Significant differences were observed for items like bananas, beer, bread, cheese, poultry, wines, and non-alcoholic drinks across models. Inconsistencies were particularly pronounced for products like sheep and goat meat, ready meals, bread, pork, sausages, and citrus fruits, largely due to the limitations of the GLM method. Certain products, such as aroma and taste essences, ice cream, and ready meals showed implausible consumption decline based on age and gender projections, highlighting data limitations for these products. This necessitates further investigation into these food categories.
Discussion
The findings show that integrating generational changes into food demand forecasts enhances prediction accuracy. This is particularly crucial when simple linear models fail to capture turning points in consumption dynamics. The consistent forecasts across various techniques for many food items highlight the influence of generational and socioeconomic characteristics on food consumption. This study contributes to the field by demonstrating a method for predicting the behavior of future generations based on past data from previous generations. The introduction of ordered generational trends in consumption analysis offers a novel approach for forecasting food consumption decades into the future, even for yet-to-be-born generations. The use of social group forecasting and binary SuperLearner predictions also provide valuable insights for both socioeconomists and demographers.
Conclusion
This research presents a novel methodology for forecasting food demand, incorporating generational changes and employing a robust SuperLearner approach. The findings demonstrate the significant impact of generational shifts and socioeconomic characteristics on long-term food consumption. The study's limitations, such as the exclusion of mixed-generation households and potential biases in data, are noted. Future research should address these limitations and explore alternative models. This methodology offers valuable insights for policymakers, stakeholders, and researchers in the food sector, informing sustainable food system management and policy decisions.
Limitations
Several limitations affect the study's generalizability. The exclusion of mixed-generation and mixed-age households introduces bias. The binary nature of gender and regional language variables is a simplification. The assumption of homogeneity within social groups may not fully reflect reality. The quality and detail of food data could be improved. The linearity assumption in forecasting may not fully capture complex trends. The reliability of the models and inter-model comparisons could be enhanced by using more models and data. The assumption of consistent factor contributions over time needs further investigation. The study also assumes that social structures will evolve similarly between 2020-2050 as in prior years.
Related Publications
Explore these studies to deepen your understanding of the subject.