logo
ResearchBunny Logo
Designing a monitoring program for aflatoxin B1 in feed products using machine learning

Food Science and Technology

Designing a monitoring program for aflatoxin B1 in feed products using machine learning

X. Wang, Y. Bouzembrak, et al.

This groundbreaking study by X. Wang, Y. Bouzembrak, A. G. J. M. Oude Lansink, and H. J. van der Fels-Klerx delves into using machine learning to optimize monitoring programs for aflatoxin B1 in feed products, achieving significant cost reductions while maintaining high accuracy. The research highlights the applicability of this approach beyond food safety hazards.... show more
Introduction

The study addresses the challenge of efficiently monitoring aflatoxin B1 (AFB1) in feed materials, a mycotoxin with significant genotoxic and carcinogenic risks, and economic implications when exceeding legal limits. EU regulations mandate risk-based control programs prioritizing high-risk batches for sampling and analysis (S&A). Traditional approaches have examined either risk-based prioritization or cost-effectiveness of S&A in isolation. The research question is whether machine learning (ML) can be used to design a risk-based monitoring program that both identifies high-risk batches and minimizes overall monitoring costs, integrating accuracy-based and non-accuracy-based (economic) criteria.

Literature Review

Prior work prioritized feed ingredients for AFB1 monitoring and assessed cost-effectiveness of S&A strategies. Van der Fels-Klerx et al. used statistical analyses to identify products at higher risk and developed models to prioritize feed ingredients by health impact. Focker et al. optimized aflatoxin monitoring costs in the maize chain by evaluating S&A strategies. Wang et al. optimized sampling for multiple chemicals (including aflatoxins) in the dairy chain to reduce public health impacts. However, risk-based and cost-effectiveness approaches were previously studied separately. ML has been applied widely in food and medical sciences, and to mycotoxin prediction (e.g., deep neural networks for maize AFB1/fumonisin using weather and cropping factors), but most evaluations relied on accuracy metrics (accuracy, recall, AUC) and rarely considered cost metrics. This study fills the gap by combining risk-based prediction with economic evaluation to design a cost-effective monitoring program and by comparing multiple ML algorithms with both accuracy- and cost-based criteria.

Methodology

Design: A risk-based monitoring program was developed by combining an ML module (to predict high-risk batches exceeding EC legal limits) and an economic module (to estimate monitoring costs from a feed industry perspective). The monitoring point was set before feed materials enter feed factories. Data: 5605 records (2005–2018) from the Dutch official AFB1 control program (4492) and private industry monitoring (1113), presumed to follow Regulation (EC) No 401/2006 sampling/analysis. Each record represents one batch. Data from 2005–2015 were used for training/testing (internal validation), and 2016–2018 for external validation/application. ML module: Input variables included sampling month, product, product subgroup, product group, country of origin, and country of analysis. The 2005–2015 dataset was split into 90% training and 10% testing subsets. Four algorithms were evaluated: Decision Tree (DT), Logistic Regression (LR), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGB). Models were first run with default parameters and then tuned using cross-validation to address class imbalance and improve performance. Evaluation metrics were AUC, recall, precision, and accuracy. A recall threshold of 0.8 for identifying non-compliant samples was used; models not meeting this were excluded from selection. Economic module: Costs were linked to predicted outcomes (TP, FP, TN, FN) and follow-up actions. For TP and FP (predicted non-compliant), S&A and 2 days storage costs are incurred; if non-compliant (TP), batch is rejected/returned to trader. TN (predicted compliant and actually compliant) incur no cost. FN (predicted compliant but actually non-compliant) incur recall-related costs and disease burden costs. Assumptions: one raw material FN batch (100 t) contaminates 10% of a 1000 t compound feed batch at 20 µg/kg AFB1; other organizations detect contamination downstream; unused contaminated feed is recalled/destroyed/replaced; consumed portion contributes to disease burden. Cost parameters (per batch, unless noted): C_colle=1000 € (labor for collecting 100 incremental samples), C_analy=100 € (LC-MS/MS analysis of one aggregate sample), C_stor=96 € (two days total), C_recall=5800 €, P_recall=60%, C_destr=500 €, C_price=22000 €, C_burden=30987 € (0.313 DALYs/100,000 × 99,000 €/DALY). Formulas: TP_cost = C_colle + C_analy + 2×C_stor; FP_cost same as TP_cost; FN_cost = P_recall×(C_recall + C_destr + C_price)×10 + (1−P_recall)×C_burden. Total designed program cost: TC_desi = N_FN×FN_cost + N_TP×TP_cost + N_FP×FP_cost. Official current cost: TC_curr = (N_FN+N_TP+N_FP+N_TN)×(C_colle + C_analy). Cost reduction = (TC_curr − TC_desi)/TC_curr. Model selection and application: The algorithm with best combined performance (accuracy metrics and lowest cost) on 2005–2015 was selected and then applied to 2016–2018 for external validation and cost comparison with the official program. Feature importance was calculated (XGB).

Key Findings
  • With default parameters, average AUC ~0.7, recall ~0.3, precision ~0.4, accuracy ~1.0; recall did not meet the 0.8 threshold.
  • With tuned parameters, AUC ~0.9, recall ~1.0, precision ~0.1, accuracy ~0.9, indicating a trade-off between recall and precision. Tuned XGB achieved AUC 0.99, recall 1.0, precision 0.3, accuracy 0.98 (internal validation).
  • Cost evaluation (internal/train-validation-test): Tuned XGB yielded the lowest monitoring cost (121,488 €) and highest cost reduction (~97%) compared to other ML algorithms. DT and XGB reduced costs substantially (around 270% reduction relative to current program across datasets as reported), whereas LR and SVM increased costs versus the current plan.
  • External validation (2016–2018): Tuned XGB maintained best performance with AUC 0.98, recall 1.0, precision 0.04, accuracy 0.97. LR’s comparable external performance was attributed to chance given its lower recall/AUC internally.
  • 2016–2018 application: Of 841 batches, the designed program flagged 25 high-risk batches (TP+FP=1+24) for S&A, accepting 816 as compliant (TN). The official program sampled all 841. Total cost: official 925,100 € vs designed 32,300 €, achieving 96% overall cost reduction (at least 82% per year).
  • Feature importance (XGB): Top features included product rice and barley; product groups groundnut/peanut, palm kernel, coconut; countries of origin Hungary and USA; countries of analysis Egypt and the Netherlands. Groundnuts and maize from specific origins and months were often predicted high-risk (illustrative examples provided).
Discussion

Integrating ML predictions with an economic cost framework effectively addresses the dual objective of risk-based monitoring and cost minimization. High recall ensures capture of non-compliant batches, while the cost module quantifies consequences of false decisions, enabling selection of models that are not only accurate but also economically efficient. The tuned XGB model consistently outperformed others in both internal and external validations and delivered substantial cost savings compared to the official program. This supports the feasibility and value of incorporating non-accuracy-based criteria (monitoring cost) into ML model selection for practical food/feed safety applications. The approach complements existing cost-optimization of S&A for single batches; combining both (predicting which batches to monitor and optimizing S&A within those batches) could further enhance efficiency. Implementing the designed program could free resources for random testing to maintain data quality and for monitoring additional hazards, thereby improving overall surveillance strategies.

Conclusion

The study demonstrates that ML can be used to design an effective risk-based monitoring program for AFB1 in feed materials by jointly considering predictive performance and economic impact. Among four evaluated algorithms (DT, LR, SVM, XGB), tuned XGB provided the best balance, achieving high AUC and recall while delivering up to 96% cost reduction compared to the official monitoring program (2016–2018). This integrated framework offers authorities and industry a practical tool to focus S&A on high-risk batches, reduce costs, and potentially reallocate resources to broaden hazard monitoring. Future work should incorporate additional explanatory variables (e.g., weather and agronomic factors) and combine batch selection with optimized S&A strategies within batches to further improve efficiency and robustness.

Limitations
  • Data and scope: Historical data are from the Netherlands (2005–2018) and may not generalize to other regions or time periods. External validation covered 2016–2018 only.
  • Feature set: Only a limited set of batch descriptors (month, product, subgroup/group, origin, analysis country) were used; important drivers like weather and agronomy were not included.
  • Class imbalance and data drift: Long-term application focusing on high-risk batches could reduce the number of compliant records in future data, potentially biasing models; random sampling is recommended to mitigate this.
  • Assumptions in economic model: Fixed batch size of 100 t; FN scenario assumes 10% of 1000 t compound feed contaminated at 20 µg/kg; 60% recall rate; 100% downstream detection; disease burden derived from 0.313 DALYs/100,000 and 99,000 €/DALY. These assumptions may over/underestimate real costs.
  • Measurement uncertainty and analytical variance were not incorporated when deciding compliance, potentially affecting borderline cases.
  • Cost parameters sourced from literature and open data may vary across contexts; results depend on these inputs.
  • LR was excluded in some analyses due to not meeting recall threshold; different thresholds or costs could change model ranking.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny