Introduction
Tea, a globally popular beverage, boasts numerous health benefits attributed to its bioactive compounds, particularly tea polyphenols and epigallocatechin gallate (EGCG). Accurate and rapid detection of these components is crucial for quality control, product development, and consumer satisfaction. Traditional methods like the Folin phenol method and HPLC are time-consuming and destructive. This research seeks to address this limitation by developing a rapid, non-destructive method for simultaneously predicting tea polyphenol and EGCG content. The study leverages the power of Fourier Transform-near infrared (FT-NIR) spectroscopy, a non-destructive technique that has shown promise in analyzing complex matrices. By combining FT-NIR spectroscopy with machine learning algorithms, the researchers aim to create robust models capable of accurately predicting the concentrations of tea polyphenols and EGCG in various tea varieties. This approach holds significant potential for streamlining the tea breeding process by enabling rapid screening of genotypes with high concentrations of these valuable bioactive compounds. Previous studies have demonstrated the use of spectroscopy in tea quality monitoring, specifically for caffeine content and total polyphenols. However, the application of this technique for simultaneously predicting both tea polyphenols and EGCG during the tea breeding process remains relatively unexplored, creating a research gap that this study aims to fill. The successful development of such a predictive model would provide a valuable tool for tea producers, breeders, and consumers, facilitating efficient quality control and product development, ultimately contributing to a deeper understanding of tea's composition and its associated health benefits.
Literature Review
Existing literature demonstrates the successful application of spectroscopic techniques, particularly near-infrared (NIR) spectroscopy, in the quality assessment of tea. Studies have utilized NIR spectroscopy coupled with chemometrics to determine caffeine content during green tea processing, achieving high predictive performance with determination coefficients (R<sup>2</sup><sub>p</sub>) exceeding 0.834. Other research employed NIR spectroscopy and PLSR to rapidly detect total polyphenol content in fresh tea leaves, achieving R<sup>2</sup><sub>p</sub> values greater than 0.95. Furthermore, NIR spectroscopy combined with modified partial least squares (MPLS), principal component regression (PCR), and multiple linear regression (MLR) has been successfully used to quantify caffeine and various catechin monomers, including EGCG, EGC, and GC in green tea powder, with R<sup>2</sup><sub>p</sub> values exceeding 0.90 for most catechins. Vis/NIR spectroscopy has also been employed to assess tea quality during fermentation, providing accurate predictions of total catechins and theanine content. However, these studies predominantly focus on individual components or specific stages of tea processing, with limited research on the simultaneous prediction of both tea polyphenols and EGCG during the crucial breeding process. This study aims to bridge this gap by creating a comprehensive model for simultaneous prediction.
Methodology
This study employed FT-NIR spectroscopy to analyze 84 tea powder samples from four tea tree varieties (A, DC, BD, and W1). Fresh tea leaves were harvested, dried, and ground to produce the powder samples. FT-NIR spectral data were collected using a Thermo Fisher Scientific Antaris II spectrometer, using an integrating sphere diffuse reflectance sampling module. Three spectral scans were taken per sample at 120° intervals, and the average spectrum was used for analysis. Tea polyphenol content was determined using the Folin phenol method, while EGCG content was measured using UPLC according to the Chinese national standard GB/T 8313-2018. Prior to model development, outlier samples were identified and removed using the Monte Carlo cross-validation (MCCV) method. The remaining 82 samples were divided into calibration (55 samples) and prediction (27 samples) sets using the Kennard-Stone algorithm. Five spectral preprocessing methods—Savitzky-Golay smoothing (SG), standard normal variate (SNV), vector normalization (VN), multiplicative scatter correction (MSC), and first derivative (FD)—were applied to improve the quality of the spectra. PLSR and LS-SVR models were then built for both tea polyphenol and EGCG prediction using the preprocessed spectra. Furthermore, competitive adaptive reweighted sampling (CARS) and random forest (RF) algorithms were employed for variable selection to reduce the number of spectral wavelengths used in the models, improving model efficiency and interpretability. Model performance was evaluated using the correlation coefficient (R), root mean square error (RMSE), and residual predictive deviation (RPD). The higher values of R and RPD, and the lower values of RMSE indicate better prediction accuracy.
Key Findings
The study found significant statistical differences in tea polyphenol and EGCG content among the four tea tree varieties. The average spectra of the different varieties showed similar trends but varied in absorbance magnitude, reflecting differences in polyphenol and EGCG content. Outlier removal improved the predictive performance of the PLSR models for both tea polyphenols and EGCG. The best performing model for tea polyphenol prediction was LS-SVR using SG-smoothed full spectra (R<sub>p</sub> = 0.975, RPD = 4.540). For EGCG prediction, the best model was LS-SVR using the original, unprocessed full spectra (R<sub>p</sub> = 0.936, RPD = 2.841). The application of variable selection algorithms further improved the predictive performance. For tea polyphenols, the LS-SVR model using 30 CARS-selected variables achieved R<sub>p</sub> = 0.978 and RPD = 4.833. For EGCG, the LS-SVR model using 27 RF-selected variables achieved R<sub>p</sub> = 0.944 and RPD = 3.049. The selected wavenumbers were interpreted based on their association with specific chemical functional groups within tea polyphenols and EGCG. The CARS algorithm effectively selected wavenumbers associated with O-H, C-H, and C=O groups in phenolic compounds for tea polyphenol prediction, while the RF algorithm identified wavelengths related to O-H and C-H groups in phenolics and C=O groups in lipids for EGCG prediction. This reduction in the number of wavelengths significantly improved model efficiency without compromising prediction accuracy. The findings highlight the capability of FT-NIR spectroscopy and machine learning to rapidly identify superior tea genotypes with high tea polyphenol and EGCG contents.
Discussion
The results demonstrate the successful development of rapid and accurate predictive models for tea polyphenols and EGCG content using FT-NIR spectroscopy combined with machine learning. The high R<sub>p</sub> and RPD values achieved, especially after variable selection, indicate excellent predictive performance. The use of variable selection not only improved accuracy and efficiency but also provided insights into the spectral regions most relevant to the prediction of these bioactive compounds. These findings offer a significant advancement over traditional methods, enabling high-throughput screening of tea genotypes for superior quality. This approach is highly valuable for tea breeders and producers, facilitating faster and more efficient cultivar selection. The ability to simultaneously predict both tea polyphenols and EGCG provides a more comprehensive assessment of tea quality compared to methods focusing on individual components. This study's success underscores the potential of FT-NIR spectroscopy and machine learning for rapid quality assessment in other agricultural products.
Conclusion
This study successfully developed robust and efficient models for the simultaneous prediction of tea polyphenol and EGCG content in tea leaves using FT-NIR spectroscopy and machine learning. The application of variable selection algorithms significantly improved the models’ predictive ability and reduced computational complexity. The findings highlight the potential of this approach for high-throughput screening of tea genotypes during breeding, enabling rapid selection of superior cultivars with high levels of these valuable bioactive compounds. Future research could explore the applicability of this method to different tea varieties, processing techniques, and environmental conditions. Investigating the influence of other factors on tea polyphenol and EGCG content and further refining the models could improve their predictive power.
Limitations
The study's sample size, while substantial, could be further expanded to enhance the generalizability of the models to different tea cultivars and growing conditions. The models' performance may be influenced by variations in the FT-NIR spectrometer settings and environmental conditions during spectral data acquisition. The Folin-Ciocalteu method used for total polyphenol determination measures total phenolics, while the UPLC method focuses on EGCG quantification. This inherent difference might influence the overall model accuracy. Further research could use more sophisticated chemical analysis for total polyphenols and other related compounds.
Related Publications
Explore these studies to deepen your understanding of the subject.