logo
ResearchBunny Logo
Forecasting food trends using demographic pyramid, generational differentiation and SuperLearner

Food Science and Technology

Forecasting food trends using demographic pyramid, generational differentiation and SuperLearner

D. Loginova and S. Mann

This innovative study predicts food consumption patterns across social groups until 2050 using extensive Swiss household data. With insights from Daria Loginova and Stefan Mann, discover how generational changes impact our dining tables in the future.... show more
Introduction

Studies on future food demand underscore the importance of realistic projections for stocks, logistics, production, supply chains, disease risk, and environmental footprint. This research develops methods to predict food consumption patterns for future decades and social groups while explicitly incorporating generational change. Leveraging Swiss big data on household food purchases and demographic projections by age and gender, the study extends consumption modelling by introducing applied techniques for long-term forecasting: Model A (reference linear trends), Model B (trends by age and gender combined with the demographic pyramid), Model C (generational trends), Model D (GLM with additional socioeconomic factors), and Model E (a SuperLearner combining logit, XGBoost, and random forest on range-normalised data). The paper details data and processing, methods, results including limitations, and concludes with implications.

Literature Review

Prior approaches to food demand forecasting range from partial equilibrium models, extrapolation, and linear regression to machine learning methods such as decision trees, neural networks, and boosting. Reviews indicate that projected global kilocalorie demand varies widely and is not consistently tied to model complexity. Food tastes and demand depend on age, income, gender, and also change across generations and social groups over time. Switzerland is a suitable case because consumption patterns are primarily demand-driven culturally rather than supply-constrained. This study aims to account for culturally and demographically driven factors at the national level in forecasting, situating its contribution among both traditional econometric and modern ML-based approaches.

Methodology

Data: Household-level data from the Swiss Federal Statistical Office Household Budget Survey (up to ~12,000 households/year, 1990–2017) record monthly household food purchases by weight/volume, plus household characteristics. The database covers 75 food categories and yields ~20 million observations of per-person consumption volumes. Generations are defined in 10-year birth cohorts from Generation 0 (1896–1905) to Generation 15 (2046–2055). Households with all members in the same generation are assigned that generation; mixed-generation and households with children are excluded from generational analyses. Sampling is broadly balanced across key characteristics (household size, income, age, language region), though older populations are slightly overrepresented after 2010 and females are overweighted. We exclude 1% highest and 1% lowest per-person consumption observations as outliers.

Social groups (IDs): Household characteristics are combined into a ‘social ID’ varying by model: Model A uses no social factors; Model B uses age interval and gender; Model C uses generation and gender; Models D/E use age, gender, generation, income, and regional language. For each year, households are grouped by social ID, per-person consumption is averaged by group and food, and group population counts (sum of household members) are used as weights to compute shares of each social group in the total population.

Population dynamics: Demographic data (actual 1990–2022 and forecast 2023–2050) by age and gender are taken from the Federal Statistical Office’s projections. Ages are mapped to birth years to derive group sizes and shares by age×gender and generation×gender for weighting Models B and C. For Models D/E, detailed social-group sizes are projected data-driven due to lack of official forecasts at that granularity.

Addressing forecasting challenges: Three issues are tackled—nonstationarity, inappropriate linearity and normality assumptions, and implausible negative or unbounded long-run predictions. For Models A–C, consumption is transformed to growth rates within each social group to promote stationarity and avoid negative projections. Because raw data have a gap for 1991–2000, constant growth rates are imputed for that period to link 1990 and 2000. For Models D/E, range normalisation is applied across social groups for each food; binary coding at a 0.5 threshold is used for SuperLearner’s binomial models to constrain predictions to a finite range.

Model A (reference linear trend): For each food i and time t, regress group-level consumption growth C_{id,t} on t using robust linear models (felm/lm), and extrapolate. No social weighting is applied in the final aggregation.

Model B (age×gender + demographic pyramid): For each food, age group, and gender, estimate linear trends of consumption growth over time. Insignificant slopes and intercepts (p>0.01) are set to zero. Extrapolate to 2050, transform back to levels, and weight by projected shares of age×gender groups from the demographic pyramid to obtain population-level forecasts.

Model C (generational trends): For each food, generation, and gender, estimate linear trends in growth. Extrapolate trend parameters across generations to handle unborn cohorts; initialise unborn cohorts at their formation with the average consumption growth of previous generations at the same age. Transform back to levels and weight by projected shares of generation×gender groups.

Model D (GLM with socioeconomic factors): Using range-normalised consumption per person as the dependent variable, estimate a GLM with a time trend and covariates (generation, age interval, gender, income, region). Advance the time span to 2050 and extend generations by three cohorts to predict for all combinations. Predicted values are clipped to [0,1] if necessary (<1% affected). Combine with projected shares of social groups and generations to form population-level forecasts.

Model E (SuperLearner): Train an ensemble (logit/GLM, XGBoost, random forest, plus auxiliary learners) on binarised, range-normalised outcomes to maximise ROC. Individual models tested on holdout splits for representative foods (chicken, bread, milk, olive oil) yielded ROC 0.6–0.8; SuperLearner weights the best learners to improve predictive performance. Train on all data, extend years to 2050 and generations by three cohorts, obtain predicted probabilities in [0,1], and weight by projected social group and generation shares. Assumes future social groups do not exceed historically observed/predicted maxima in the normalised scale.

Additional calculations and weighting: Social-group sizes are extrapolated by linear trends at the group level and converted to shares each year. Shares of generations in the population are derived from demographic data at 10-year frequency; migration can affect cohort sizes. Mixed-generation households and households with children lack statistics for population shares and are acknowledged as a bias. Predictions are transformed back from growth rates to levels prior to weighting. Final population forecasts are weighted sums over age×gender (Model B), generation×gender (Model C), or over social groups and generations jointly (Models D/E).

Key Findings
  • Data and scope: Forecasts for 75 foods to 2050 using ~20 million observations from 46,456 Swiss households (1990–2017), integrating demographic projections.
  • Generational (Model C) and SuperLearner (Model E) approaches provided more plausible and stable long-run forecasts than comparable linear/GLM approaches when many factors were included. Projection lines incorporating generations tend to be convex/concave due to changing generational weights, whereas simple trends often yielded effectively flat projections in growth rates.
  • SuperLearner prevented zero/negative or failed GLM forecasts for at least 16 foods (e.g., canned meat, cream, jam, leafy vegetables, milk, mineral water, mushrooms, nuts, potatoes, stone fruit, sugar, tea and herbs, veal, vegetarian soy products, wild/rabbit meat, yogurt). It also delivered forecasts where GLM failed for apples, butter, coffee, margarine, root vegetables, potatoes, tomatoes.
  • Diversity across models emerged notably for sheep/goat meat, ready meals, bread, pork, sausages, citrus (except lemons), and other foods—often reflecting GLM vulnerability on some items.
  • Illustrative average annual changes (2020–2050) from Table 2: poultry (B 0.01%, D 0.76%, E 0.8%); fish (C 0.06%, D 0.46%, E 0.49%); bananas (B 0.1%, C -0.03%, D 1.29%, E 0.9%); beer (B -0.14%, C 0.08%, D 1.06%, E 0.39%); olive oil (C -0.44%, D 1.47%, E 0.9%); wines (B 0.11%, C -0.02%, D -0.42%, E -0.11%); bread (B 3.47%, C 0.01%, D -1.81%, E -0.07%); nonalcoholic drinks (B -0.99%, C 0.06%, D 0.15%, E -0.07%).
  • Holdout testing of constituent ML models for select foods achieved ROC ~0.6–0.8; SuperLearner combines them to enhance predictive power within constrained outcome ranges.
  • Overall, forecasts generally suggest movement toward healthier patterns (e.g., likely growth in eggs, decline in jam), while defining a relatively narrow plausible interval of future consumption through 2050.
Discussion

The study addresses the research question by demonstrating that incorporating generational change and detailed social structure meaningfully improves the plausibility and stability of long-horizon food consumption forecasts compared with simple linear extrapolations. Generational weighting captures population turnover, enabling identification of potential turning points in consumption dynamics. The SuperLearner’s ensemble of strong classifiers on range-normalised outcomes reduces implausible predictions (e.g., negative or zero long-run consumption) common in factor-rich linear models, broadening the range of foods with reliable forecasts. These advances are relevant to food system planning and policy, given the dependence of logistics, supply chains, public health risk management, and environmental impacts on demand trajectories. Methodologically, the work demonstrates how big administrative survey data and demographic projections can be combined through growth-rate transformations, social-group weighting, and ensemble learning to improve long-term projections. For demographers, the approach suggests forecasting population structure by generational and socioeconomic groupings as an alternative to purely linear projections. The overall results indicate that many foods exhibit consistent cross-model forecasts, reinforcing the robustness of incorporating generational and socioeconomic differentiations.

Conclusion

This paper contributes: (1) a practical method to integrate generational change into national-level food demand forecasts, allowing anticipation of turning points; (2) a demonstration that ensemble ML (SuperLearner) on range-normalised outcomes can avert implausible long-run predictions when rich factor sets are used; and (3) a framework combining big survey data with demographic projections and social-group weights to produce population-level forecasts across 75 foods. Practically, the forecasts delimit a plausible range of future consumption through 2050, supporting planning across the food sector. Theoretically, the generational scaling concept offers a novel lens for predicting behaviour of not-yet-born cohorts based on ordered slopes of generational trends. Future research should: enrich explanatory variables (e.g., finer cultural factors, non-binary gender, children’s consumption), obtain better data on mixed-generation households, consider evolving (non-constant) factor effects, explore alternative time-series models (e.g., SARIMA, GARCH), extend SuperLearner with additional learners and data, and pursue complementary demographic forecasts of social-group structures. Longer and richer panels would enable more granular and frequent generational forecasting.

Limitations

Key limitations and assumptions include: (1) Exclusion of mixed-generation households and households with children (and migrants), potentially biasing population-level estimates; children are implicitly assigned adult per-person consumption. (2) Limited explanatory variables (e.g., binary gender, coarse regional language; no smoking status or detailed cultural background), constraining model richness. (3) Group homogeneity assumption and near-equal gender shares may overrepresent smaller groups or men in projections. (4) Measurement limitations in food data (e.g., pre-processing weights, bones, waste, restaurant/tourist consumption omissions). (5) Use of linear forecasts; alternative time-series models (SARIMA, GARCH) might better capture dynamics. (6) Model reliability and cross-model comparability could be improved with more learners and data in SuperLearners. (7) Assumption that factor contributions remain stable (1990–2050) despite societal changes (e.g., rising inequality, smaller households, ageing); evolving-coefficient methods are not yet implemented. (8) Assumption that social structure will evolve post-2017 similarly to the past; dedicated demographic modelling was out of scope. (9) Limited ability to capture biological ageing cycles within generations given data span; longer series are needed. (10) Constant growth-rate imputations (e.g., 1991–2000) may not reflect true historical change; sensitivity analyses and alternative imputations are warranted.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny