logo
ResearchBunny Logo
Introduction
The increasing use of plant oils in human diets has led to a need for accurate identification and quantification of oil types, particularly in the face of widespread adulteration. While each edible oil possesses a unique fatty acid profile, existing methods struggle to identify oils based solely on this profile, especially in mixtures. This adulteration poses significant food safety and economic concerns, impacting consumer health and trust. This research addresses this challenge by developing a novel machine learning approach to detect and quantify oil adulteration in multi-component mixtures. The prevalence of adulteration is highlighted through various reports; for instance, the UC Davis Olive Center reported that up to 69% of California olive oil samples failed to meet USDA standards due to adulteration. Similarly, high rates of adulteration were reported in avocado oil in the US and olive oil in Taiwan. Such adulteration can lead to quality degradation and, in some cases, serious health issues. Current chemometric methods are limited in their ability to handle complex mixtures, often relying on qualitative analysis or struggling with accuracy when dealing with more than two oil types. The lack of an end-to-end solution for quantitative analysis of multi-component oil mixtures motivates the development of a robust and generalizable machine learning approach.
Literature Review
Previous research has established that each type of edible oil has a distinct fatty acid profile. However, these profiles are not always easily distinguishable, especially when oils are mixed. Existing methods for detecting oil adulteration, primarily chemometric techniques like PCA and PLSI, are often qualitative or struggle with accuracy when dealing with more than two oils. While some studies have explored quantitative models for specific two-oil mixtures, these approaches lack the generalizability required for real-world applications. The limitations of traditional chemometric methods in handling complex mixtures and providing quantitative results motivated the development of the machine learning approach presented in this paper. Several papers are cited demonstrating the limitations of existing techniques, emphasizing the need for a more robust and versatile solution.
Methodology
This study utilized a large dataset (19,583 samples) encompassing ten edible oil types: groundnut oil (GNO), high-erucic acid rapeseed oil (HERSO), high-oleic acid sunflower oil (HOSFO), low-erucic acid rapeseed oil (LERSO), linseed oil (LNO), low-oleic acid sunflower oil (LOSFO), maize oil (MZO), rice bran oil (RBO), soybean oil (SBO), and sesame oil (SSO). Lipids were extracted and identified as fatty acid methyl esters (FAMEs) using gas chromatography with flame ionization detection (GC-FID). An unsupervised Gaussian Mixture Model (GMM) was employed to identify sub-clusters within each oil type, revealing intra-variability and highlighting specific fatty acid differences. This information was then used to simulate a vast number of oil mixtures (12 million) for training a supervised deep learning model. The deep learning model, a deep neural network, was designed to predict the quantitative composition of unknown oil mixtures based on their fatty acid profiles. The model's performance was evaluated using independent test datasets, including both simulated and real-world oil mixtures. The absolute errors in the predictions were calculated and summarized using percentile statistics (50th, 90th, 95th, and 99th percentiles). Additionally, an online machine learning method was implemented to continuously update the model with new data, improving its generalizability and robustness. PLS2 (Partial Least Squares 2) was used as a comparative method to showcase the superiority of the proposed deep learning model. Data visualization techniques, such as t-distributed stochastic neighbor embedding (t-SNE), were used to explore the data and visualize the results.
Key Findings
The GMM analysis successfully identified 16 sub-clusters within the ten oil types, revealing intra-variability driven by differences in specific fatty acids. The deep learning model demonstrated excellent performance in predicting the composition of both simulated and real-world oil mixtures. For three-way mixtures, the model achieved a median absolute error of 1.4–1.8% and a 90th percentile absolute error of 4–5.4%. The model significantly outperformed traditional chemometric methods like PLS2, which showed considerably higher error rates. The online-training method effectively improved the model's accuracy when presented with new, geographically diverse oil samples, showcasing its adaptability. In real-life blind tests, the online-updated model displayed a substantial reduction in error rates compared to the model trained solely on the initial dataset. The ability to identify and quantify the components of complex oil mixtures, even when faced with new, previously unseen oils, is a key contribution of this work.
Discussion
The results demonstrate the effectiveness of the proposed machine learning approach in accurately identifying and quantifying the composition of complex oil mixtures. The superior performance of the deep learning model compared to traditional chemometric methods highlights the advantages of leveraging the power of deep learning for this task. The ability to incorporate new data via online training enhances the model's generalizability and adaptability to variations in oil profiles, ensuring its long-term usefulness and robustness in the face of evolving oil sources and adulteration techniques. This methodology provides a significant advancement in the field of food safety and quality control, enabling more efficient and accurate detection of oil adulteration. The accuracy of the model, combined with its ability to continuously learn and adapt, offers a practical and powerful tool for various applications within the edible oil industry.
Conclusion
This study presents a novel machine learning method for identifying and quantifying the composition of complex edible oil mixtures. The deep learning model, coupled with an online-training approach, offers a robust and adaptable solution for detecting oil adulteration with high accuracy. The findings significantly advance food safety and quality control within the edible oil industry, enabling more effective regulatory measures and improved consumer protection. Future research could focus on expanding the model to include additional oil types, investigating the impact of different processing methods on oil profiles, and exploring the integration of this technology with rapid analytical systems for improved efficiency and reduced waste.
Limitations
The model's performance relies on the accuracy and completeness of the initial dataset. Variations in oil profiles due to factors not captured in the dataset (e.g., specific growing conditions, processing variations beyond those considered) could affect the model's accuracy. The online training method assumes new oils belong to existing categories; completely novel oil types may require additional adaptations. The current study focused on a specific set of oils; the model's generalizability to a broader range of oils remains to be fully explored.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny