Food Science and Technology
Predicting and improving complex beer flavor through machine learning
M. Schreurs, S. Piampongsant, et al.
Predicting and understanding food perception and appreciation is a major challenge in food science. Accurate models could benefit producers and consumers for quality control, product fingerprinting, counterfeit and spoilage detection, new product development and pairing, and potentially supplement or replace variable, costly tasting panels. Despite these applications, predicting flavor and appreciation from chemical properties remains elusive, especially for complex foods and beverages. Prior work often focused on predicting organoleptic properties of single compounds from structure, which ignores matrix effects and compound interactions. Classical multivariate statistics commonly used in sensory science require large sample sizes, are sensitive to outliers, prone to overfitting, and are less suited for non-linear or discontinuous relationships when dealing with hundreds of interacting flavor compounds. In this study, the authors integrate extensive chemical analyses and sensory data from commercial beers with machine learning to predict taste, smell, mouthfeel, and appreciation from compound concentrations. Beer is a suitable model due to its chemical complexity and the availability of massive online review datasets. They measured over 200 chemical properties of 250 beers across 22 styles, performed descriptive sensory profiling with a trained panel, and used >180,000 consumer reviews. These datasets enabled training multiple machine learning models to predict flavor and appreciation from chemical profiles, dissect models to identify key compound drivers, and validate predictions by spiking beers to improve appreciation.
The paper highlights that previous studies linked structural properties to biological activities or concentrations of specific compounds to sensory profiles, often focusing on single compounds and overlooking complex interactions within food matrices. Classical statistical approaches (linear models, PLS) prevalent in sensory science can struggle with high-dimensional, non-linear, and interactive effects typical of flavor chemistry, leading to overfitting and sensitivity to outliers. Some prior works attempted to predict beer flavor and popularity based on limited chemical sets, and other food systems (e.g., tomatoes, blueberries) have shown success using gradient boosting approaches for flavor and appreciation prediction. Public consumer review databases can offer large-scale sensory data, though they may include biases (e.g., price, style preference, conformity). This context motivates the use of modern machine learning methods capable of capturing non-linearities and interactions, combined with large-scale chemical and sensory datasets.
- Samples: 250 commercial Belgian beers across 22 styles. Beers within expiration date purchased from retailers. Biological duplicates prepared.
- Chemical analyses: 226 properties measured including ABV, iso-alpha acids, pH, sugars, and >200 flavor compounds. Methods included HS-GC-FID/FPD for higher alcohols, acetaldehyde, esters, 4-vinyl guaiacol, sulfur compounds; HS-SPME-GC-MS for additional volatiles (terpenoids, esters) with identification via AMDIS, NIST libraries, retention indices, peak deconvolution and integration, batch normalization; discrete photometric/enzymatic analyses for acids, ammonia, beta-glucan, iso-alpha acids, color, sugars, glycerol, iron, pH, protein, sulfite; NIR for ethanol. Mean of biological duplicates reported. CO2 calculated from bottle pressure.
- Sensory analyses (trained panel): 16 trained panelists (selected from 30) assessed 50 attributes (aroma, taste, mouthfeel) on 7-point scales per ASBC methods. Beers served blind (black glasses), controlled temperature (12–16 °C), sessions by style. Scores scaled per taster (z-scores). Panel consistency checked via repeated samples and ANOVA (95% no significant differences).
- Public reviews: Scraped 232,288 RateBeer reviews; after filtering (English language by two detectors, reviewers with ≥100 entries), retained 181,025 reviews from >6000 reviewers. Numerical scores (appearance, aroma, taste, palate, overall) scaled per rater and averaged per beer. Text mining: manual labeling of up to 50 reviews per beer to train classifier to categorize sentences; extracted taste/aroma sentences; computed TF-IDF-based enrichment for sensory terms. Beer prices collected from multiple retailers; prices normalized per liter to assess correlation with appreciation.
- Data processing: Chemical properties with log-normal distributions log-transformed; 0.1% missing replaced by attribute mean. Dataset split 70/30 into train/test stratified by style. Chemical measurements normalized by train-set mean and SD.
- Modeling: Constructed regression models to predict (a) trained panel descriptors and (b) RateBeer appreciation scores from chemical profiles. Models: LR with first-order interactions, Lasso with interactions, PLSR, AdaBoost, Extra Trees, Gradient Boosting (GBR), Random Forest, XGBoost, SVR, ANN (MLPRegressor). Hyperparameter tuning via 5-fold cross-validated grid search optimizing R²; ANN optimized via Bayesian TPE (Optuna). Trained individual-attribute and multi-output models.
- Model evaluation: Test-set R² for multi-output models; average rank across descriptors for individual-attribute models. Additional robustness: 100 iterations for GBR, RF, ET to assess stability of performance and feature importance.
- Model interpretation: Feature importance via impurity-based (mean decrease in impurity) and SHAP values. Partial dependence plots (one- and two-way) for top predictors of appreciation. Explored adding dataset identifier (panel vs RateBeer) and beer style encodings to models.
- Experimental validation: Spiking experiments on a Blond beer and a non/low-alcohol beer by increasing concentrations of top SHAP/feature-importance predictors (e.g., ethyl acetate, ethanol, lactic acid, ethyl phenyl acetate, and correlated compounds ethyl hexanoate, isoamyl acetate, glycerol) to within-style 95th-percentile ethanol-normalized concentrations. Directional difference tests and preference tests conducted by trained panel; statistics via two-sided binomial tests.
- Dataset and correlations: 250 beers, 226 chemical properties, 50 sensory attributes, and 181,025 filtered public reviews. Expected and novel correlations observed among chemical compounds and sensory attributes. Examples: iso-alpha acids vs bitterness (rho=0.68); ethanol with perceived alcohol (rho=0.82) and glycerol with body (rho=0.57); sweetness and bitterness anti-correlated (rho=-0.48); hop aroma and bitterness strongly correlated; lactic and acetic acid correlated (rho=0.66). Iron anti-correlated with hop aromas/bitterness.
- Public reviews vs trained panel: Moderate correlation between overall appreciation from panel and RateBeer (rho=0.29). Online reviews influenced by price (rho=0.49 with appreciation) while trained panel less so (rho=0.19). Text-mined features from reviews correlate well with panel for basic attributes (acidity, bitterness, sweetness, alcohol, malt, hops) and distinctive notes like 4-vinyl guaiacol; less so for specific aromas (esters, coriander, diacetyl).
- Model performance: Tree-based models outperformed linear models and ANN on tabular data. Gradient Boosting achieved best overall performance, especially for predicting RateBeer metrics (RateBeer multi-output R² ≈ 0.69 best; up to 0.75 for specific features). Linear regression severely overfit (negative test R²). Lasso and PLSR mitigated overfitting but underperformed vs trees. Models predicted taste better than aroma; RateBeer predictions better than trained panel due to averaging across many reviewers (e.g., GBR predicts RateBeer appreciation R²=0.67 vs trained panel appreciation R²=0.09). SVR and ANN showed intermediate performance.
- Feature importance and SHAP: Ethyl acetate identified as top predictor of consumer appreciation; ethanol second; protein level and lactic acid also highly important. Unexpected positive contributions at moderate levels from methanethiol and ethyl phenyl acetate (often considered staling-related). SHAP uncovered drivers that simple correlations would miss (e.g., lactic acid shows bimodal relationship with appreciation by style).
- Robustness: 100 iterations of GBR/RF/ET showed stable performance and consistent top predictors (especially ethanol and ethyl acetate), with more variability in ET due to randomization and co-correlations.
- Added metadata: Combining panel and RateBeer datasets or adding style indicators did not improve performance; dataset identifier dominated when included; style information largely implicit in chemistry and sample sizes per style too small.
- Experimental validation: Spiking Blond beer and a non/low-alcohol beer with mixtures of top predicted compounds significantly increased overall appreciation and intensified ester flavors, sweetness, alcohol perception, and body; improvements remained even without adding ethanol.
The study demonstrates that modern machine learning, particularly tree-based methods like Gradient Boosting, can effectively model complex, non-linear relationships between comprehensive chemical profiles and sensory perceptions/appreciation of beer. The models leverage large-scale consumer review data to achieve higher predictive accuracy than trained panel data for appreciation, highlighting the value and feasibility of using public datasets despite inherent biases. Taste attributes (e.g., bitterness, acidity, alcohol) are more directly predictable from chemistry than aroma, which often arises from interactions among many volatiles. Model interpretation (feature importance and SHAP) revealed both expected and unexpected chemical drivers of appreciation; notably, ethyl acetate, ethanol, protein, and lactic acid emerged as key factors, with moderate levels of typically negative compounds (e.g., methanethiol, ethyl phenyl acetate) potentially enhancing appreciation. The experimental spiking validation confirmed that model-identified compound mixtures can improve appreciation in both alcoholic and non-alcoholic beers, addressing the research goal of predicting and improving flavor from chemistry. These findings are significant for sensory science and industry, offering data-driven routes for quality control, recipe optimization, and product development, and they underscore the role of big data and explainable ML in uncovering complex flavor-perception relationships.
This work integrates extensive chemical profiling, trained panel sensory data, and large-scale consumer reviews to build predictive models of beer flavor and appreciation. Tree-based machine learning, particularly Gradient Boosting, outperforms conventional linear methods and neural networks on this tabular, high-dimensional, and interactive problem. Interpretable model dissection identified key chemical drivers of appreciation, including ethyl acetate, ethanol, protein, and lactic acid, and suggested beneficial effects of moderate levels of compounds often viewed negatively. Crucially, model-guided spiking experiments increased appreciation of both a Blond beer and a non/low-alcohol beer, validating the predictive framework. Future work should expand datasets across geographies and styles, include additional hard-to-measure aroma compounds (e.g., thiols), incorporate demographic and contextual factors, gather more poorly rated samples to cover extreme cases, and explore methods to better resolve causal relationships among co-correlated variables. Such advances can further enable computer-aided flavor engineering and quality control across diverse foods and beverages.
- Model-related: Gradient Boosting can favor variables with larger main effects among co-correlated predictors, potentially underestimating importance of correlated true drivers. Partial dependence plots indicated saturation rather than penalization at high concentrations due to lack of extreme (off-flavor) samples.
- Data scope: Beers limited to Belgian breweries; some global styles or consumer patterns may be underrepresented. Dataset comprises high-quality end-products, lacking poorly appreciated beers, limiting prediction at extremes.
- Chemistry coverage: Not all flavor-active compounds were measured (e.g., certain hop thiols at very low concentrations are difficult to quantify), possibly omitting important drivers.
- Sensory/consumer biases: Consumer reviews are subject to biases (price, style preferences, conformity). Trained panel data are variable and limited in size. No demographic metadata included for modeling.
- Style metadata: Explicit style encoding did not improve models, likely due to implicit chemical signatures and small sample sizes per style; styles are not rigorously defined and may overlap, adding noise.
- Causality: Models identify associations, not causation; co-correlated variables complicate interpretation. Experimental validation is needed to identify true causal compounds.
Related Publications
Explore these studies to deepen your understanding of the subject.

