logo
ResearchBunny Logo
Introduction
Ensuring food safety is challenging due to complex and fraud-vulnerable supply chains. AI, particularly data-driven Bayesian Networks (BNs), offer promising predictive capabilities for food fraud. However, data sharing across the supply chain is hindered by differing interests, security, and privacy concerns. Federated learning (FL) offers a solution by enabling model training without data leaving its owner's database. This research explores the application of FL to food fraud prediction using BNs. Existing AI solutions for food safety and fraud often utilize BNs due to their transparency, interpretability, ability to handle uncertainty, and incorporation of prior knowledge. However, data sharing and integration remain a significant hurdle. Data owners often have conflicting interests and priorities, making data sharing challenging. Furthermore, food safety and fraud data can be politically sensitive and contain competitive advantages, increasing the reluctance to share. Federated learning addresses these challenges by keeping data localized, requiring no raw data exchange, providing high-level data security, reducing data traffic, and enabling parameter learning from all data stations. This study focuses on the development and implementation of an FL infrastructure to predict food fraud types using a federated BN model, aiming to demonstrate the feasibility of knowledge sharing without compromising data privacy and security.
Literature Review
The paper reviews existing literature on AI applications in food safety and fraud detection, highlighting the suitability of Bayesian Networks. It emphasizes the challenges of data sharing in food supply chains, referencing studies on reflexive governance, FAIR data principles, and the limited availability of food safety ontologies. The authors then introduce federated learning (FL) as a solution, citing its successful application in life sciences and its potential for the food domain. Several studies on federated learning applications in life sciences and healthcare are mentioned, showcasing the technique's potential for handling sensitive data.
Methodology
The study utilized the Vantage platform (version 2.3.4) for its federated learning infrastructure. Data from the EU Rapid Alert System for Food and Feed (RASFF) and the US Economic Motivation Adulteration (EMA) databases were used. This data was split into three datasets, each hosted on a separate data station in different geographic locations within the Netherlands (Wageningen, Maastricht, and Utrecht). STATION-1 contained RASFF data from 2008-2013 (CSV format), STATION-2 contained RASFF data from 2014-2018 (CSV format), and STATION-3 contained EMA data from 2008-2018 (RDF format, converted to CSV for BN model compatibility). Each station included data on fraud type, product category, year, origin country, and reporting country. The federated architecture involved a central server for collaboration management and authentication, along with data stations equipped with local data storage, Docker daemon, algorithm container, CLI, and configuration files. A Bayesian Network (BN) model, implemented using the R package 'bnlearn', was trained using the Tree-Augmented Naive Bayes (TAN) algorithm for structure learning and a Bayes method for parameter estimation. The process involved dividing each dataset into training (80%) and testing (20%) sets, creating individual BN models for each station, and a combined model trained on the aggregated data from all three stations. Two experiments were conducted: 1) evaluating the performance of individual and combined BN models on each station's test data; 2) comparing the combined FL BN model's performance with a BN model trained on the entire dataset without a federated infrastructure. Model performance was assessed using AUC, sensitivity, and specificity.
Key Findings
Experiment 1 showed high accuracy for individual BN models in STATION-1 and -2 (AUC=0.96), but lower accuracy in STATION-3 (AUC=0.72). The combined BN model showed AUCs of 0.89, 0.99, and 0.74 for STATION-1, -2, and -3, respectively. This indicated that increased data volume doesn't guarantee higher accuracy due to data heterogeneity. The combined model displayed improved sensitivity, particularly for STATION-2 (0.49 to 0.78), indicating better food fraud identification. However, specificity decreased in STATION-2 (0.83 to 0.69). Experiment 2, using a BN trained on the entire dataset without FL, resulted in an AUC of 0.86, indicating that the FL approach maintained comparable performance while preserving privacy. Overall, the combined AUC across all stations in the federated environment (0.9) was close to the non-federated model (0.86). The results demonstrated the robustness of the FL infrastructure to data imbalances and its ability to maintain model performance while adhering to privacy and security measures.
Discussion
The study successfully demonstrated the feasibility of applying federated learning to food fraud prediction using Bayesian Networks. The results highlight the advantages of FL in addressing data privacy and security concerns while maintaining model accuracy. The close performance of the federated and non-federated models suggests that FL can effectively leverage distributed data without sacrificing predictive power. The improvements in sensitivity in some cases indicate a better ability to identify fraud instances, but the decreased specificity warrants attention to potential false positives. This highlights the importance of careful model calibration and interpretation in the context of food safety.
Conclusion
This research provides a proof-of-concept for a federated learning approach to food fraud detection, using a Bayesian Network model trained across multiple geographically dispersed data stations. The results demonstrated high accuracy and improved sensitivity while respecting data privacy and security, even with heterogeneous and imbalanced data. This approach has the potential to improve food fraud prediction models for all stakeholders in the supply chain, enhancing collaboration and trust while addressing GDPR compliance issues. Future work could explore more sophisticated models within the federated framework and investigate methods to address data heterogeneity and model interpretability further.
Limitations
The study used a limited number of data stations (three). The generalizability of the findings to larger-scale implementations needs further investigation. The choice of the Bayesian Network model and specific parameters may impact the results. Further research could explore other machine learning models and evaluate their performance in a federated setting. The data used, while representative, might not fully capture the complexities of real-world food fraud.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny