Food Science and Technology
Applying federated learning to combat food fraud in food supply chains
A. Gavai, Y. Bouzembrak, et al.
The study addresses the challenge of ensuring safe and healthy food amid complex, vulnerable supply chains where food fraud occurs. While AI, particularly Bayesian Networks (BNs), has been shown to integrate heterogeneous data and support predictive decision-making for food safety and fraud, real-world impact is limited by impediments to data sharing (ownership, competitive sensitivity, privacy/GDPR, and lack of harmonized ontologies). Federated learning (FL) offers a solution by allowing algorithms to move to the data so raw data never leave owners’ premises, reducing negotiation burdens and privacy risks. The research question is whether an FL-based BN can be trained across multiple data owners’ stations to predict food fraud types with performance comparable to a centralized model trained on pooled data, while preserving privacy and addressing data-sharing constraints. The study develops and evaluates a federated BN framework across three data stations with heterogeneous, imbalanced datasets to test applicability, performance, and potential benefits for decision-making.
Prior work highlights the growing role of AI and data-driven methods in food safety and fraud management, with BNs offering transparent, probabilistic, and interpretable models that can incorporate uncertainty and prior knowledge. Barriers to data sharing in food supply chains include differing stakeholder incentives, ownership and confidentiality concerns, and the need for metadata standards and ontologies; FOODON is one of few relevant public ontologies. FL has been successfully applied in life sciences and healthcare for privacy-preserving analytics (e.g., Personal Health Train), suggesting its potential in domains where data cannot be centralized. However, the food domain has not yet widely explored FL. The literature also notes technical and organizational issues relevant to FL adoption: data democratization and harmonization, limitations in available federated AI models, and efficiency/operationalization challenges with common ML toolchains, motivating exploration of federated infrastructures and compatible data/ontology frameworks.
Federated architecture: The Vantage6 platform (v2.3.4) was used as the FL infrastructure, centrally hosted at Wageningen University & Research for authentication and message brokering. Privacy is enforced by allowing only vetted algorithms to execute at data stations. Collaboration policies and organizations are configured centrally; Dockerized algorithms are published to an approved registry and executed at participating data stations after code validation (hash and registry-based checks). Token-based authentication/authorization controls access. Each data station comprises local data storage (CSV or database), a Docker daemon, an algorithm container (compute node), CLI tools, and configuration files with policies and API keys linking to the federated server. Security considerations include validating Docker images prior to execution.
Data and stations: Data were sourced from the EU Rapid Alert System for Food and Feed (RASFF) and the US Economic Motivation Adulteration (EMA) databases. Data were partitioned across three stations (geographically located in the Netherlands): STATION-1 (Wageningen): RASFF 2008–2013, 202 observations; STATION-2 (Maastricht): RASFF 2014–2018, 144 observations; STATION-3 (Utrecht): EMA 2008–2017, 95 observations. Variables used for modeling: fraud type (target), product category, year, origin country, and report country; variable metadata and states are defined (e.g., fraud types: Artificial enhancement/Improvement; Smuggling–Mislabeling–Origin Masking; Substitution–Dilution). STATION-3 data were in RDF using FOODON, SIO, NALT, and Wikidata; these were converted to CSV to be consumable by the BN model.
Federated BN model: Implemented in R using the bnlearn package. The algorithm was encapsulated in a Docker image using the Vantage6 R algorithm library for I/O. For each training set, Tree-Augmented Naive Bayes (TAN) learned the BN structure for the target variable “Fraud type.” Parameters were estimated via bn.fit with the “Bayes” method. Datasets were split 80/20 into training/test. The model utilized features: product category, year, origin country, and report country to predict fraud type. Outputs were provided in JSON.
Experimental design: Experiment 1 trained and evaluated (a) individual BN models per station and (b) a combined BN model trained on the aggregated training data from all stations in a federated manner; the combined model was then tested separately on each station’s test data. Experiment 2 trained a BN on the entire aggregated dataset in a traditional centralized way (random 80/20 split) without FL to compare performance with the federated approach. Performance metrics included AUC (ROC), average sensitivity, and average specificity.
Data characteristics: The three stations intentionally differed in fraud-type composition and time periods to mimic real-world imbalance and incompleteness (e.g., STATION-1 lacked Artificial enhancement/Improvement cases; STATION-2 included mostly Smuggling–Mislabeling–Origin Masking; STATION-3 included more Substitution–Dilution).
- Federated vs individual station performance (Experiment 1; Table 2):
- STATION-1: AUC individual 0.96; combined (federated) 0.89; average sensitivity 0.75 (individual) vs 0.75 (combined); average specificity 0.69 (individual) vs 0.63 (combined).
- STATION-2: AUC individual 0.96; combined 0.99; average sensitivity improved from 0.49 (individual) to 0.78 (combined); average specificity decreased from 0.83 to 0.69.
- STATION-3: AUC individual 0.72; combined 0.74; average sensitivity 0.62 to 0.66; average specificity 0.58 to 0.59.
- Interpretation: The combined federated BN broadened knowledge (e.g., added fraud categories absent in some stations) and often improved sensitivity, notably in STATION-2, with some trade-off in specificity.
- Federated combined vs centralized pooled (Experiment 2; Table 3, Fig. 4):
- Centralized BN (80/20 pooled): AUC 0.86; average sensitivity 0.72; average specificity 0.67.
- Federated combined model: AUC 0.90; average sensitivity 0.77; average specificity 0.64.
- Result: Federated training achieved comparable or slightly better AUC and sensitivity relative to centralized pooling, despite data heterogeneity and imbalance.
- Data heterogeneity matters: More data does not guarantee higher accuracy; differences in case distributions (years, fraud types) influenced performance.
- Privacy/security: The federated infrastructure enabled training without moving raw data, addressing GDPR and business sensitivity. Validation of Docker images and token-based access controlled execution and collaboration.
- Practicality: The FL framework reduced data traffic, preserved ownership, and allowed parameter learning from all stations, potentially improving decision-making across the supply chain.
The findings demonstrate that federated BN learning can effectively aggregate knowledge from multiple data owners while preserving privacy and confidentiality, achieving performance on par with or better than a centralized pooled approach. In Experiment 1, the combined federated BN improved sensitivity, particularly in STATION-2, indicating better identification of fraud cases when leveraging cross-station information. Although specificity sometimes decreased, the trade-off may be acceptable in fraud surveillance contexts where missing true frauds carries higher risk. Moreover, federated training enabled stations lacking certain fraud categories (e.g., STATION-1 without Artificial enhancement/Improvement) to benefit from shared parameters, expanding their decision space and supporting more informed choices.
The results confirm that data heterogeneity and imbalance influence model accuracy; the federated approach mitigates some of these issues by integrating diverse evidence without requiring data to be pooled physically. Operationally, the FL infrastructure addresses major data-sharing barriers—ownership, confidentiality, GDPR compliance—and can automate aspects of policy enforcement. The work highlights current ecosystem gaps (data democratization/harmonization, limited federated AI model repertoire, and efficiency/operationalization challenges) and suggests paths forward (ontology-driven data alignment, SPARQL integration, and more efficient compiled-language implementations). Overall, FL offers a viable route to collaborative, privacy-preserving analytics in food fraud surveillance and control.
This study presents a proof-of-concept federated learning framework applying a Bayesian Network to predict food fraud types across three geographically distributed data stations without moving raw data. The federated BN achieved high accuracy and improved sensitivity compared to individual station models and performed comparably or slightly better than a centralized pooled BN. The approach preserves data privacy and ownership, supports GDPR compliance, and can enhance decision-making by sharing knowledge across stakeholders with heterogeneous, imbalanced datasets. The framework is scalable to additional stations and can foster collaboration and trust, potentially reducing data collection costs and increasing efficiency. Future work should expand federated model repertoires (including deep learning adapted for FL), advance data harmonization via ontologies and standards (FOODON, FHIR, OMOP) and semantic layers (SPARQL), and improve operational efficiency by adopting compiled languages (e.g., Go, Rust) for lighter, faster federated deployments.
- Data heterogeneity and imbalance across stations affected performance; while federated learning mitigated some effects, trade-offs between sensitivity and specificity remained.
- Model/data format constraints: The BN implementation required CSV inputs; RDF data from STATION-3 had to be converted, limiting native use of linked data and dynamic schemas.
- Limited federated AI model availability: Few models are currently implemented for FL; complex models (e.g., deep learning) often require redesign for federated settings.
- Data democratization/harmonization: FL typically assumes consistent variable ordering and static tabular schemas; aligning semantics across owners requires ontologies and additional layers (e.g., SPARQL) that are not yet fully integrated.
- Operational efficiency: Dockerized Python/R models can be large and resource-intensive, impacting bandwidth and compute; this constrains scalability and real-time use.
- Security considerations: Executing Docker images at data stations introduces supply-chain risks; robust image validation and governance are needed.
- Missing affiliation detail for one author in the provided excerpt indicates minor reporting incompleteness unrelated to method validity.
Related Publications
Explore these studies to deepen your understanding of the subject.

