Introduction
The gemstone market, particularly for investment-grade stones, faces challenges in reliably determining origin and detecting treatments, factors that significantly influence value. Traditional methods, relying on subjective visual inspections and expensive advanced analytical techniques (UV-Vis, FTIR, XRF, ICP-MS), are time-consuming, inconsistent, and lack automation. This necessitates a robust, automated, and cost-effective solution. The high value of gemstones (millions of dollars per carat) makes accurate origin and treatment determination crucial for minimizing investment risks and maintaining consumer trust. The fragmented and opaque supply chain further complicates tracking individual stones' history. Gemological laboratories currently employ skilled experts for these tasks, using optical microscopy to identify features indicative of origin and treatment. However, this approach is subjective, challenging due to the similarities in features from different sources, and increasingly difficult with advancements in treatment techniques. Modern gemological labs employ a range of analytical instruments to improve reliability, but these are expensive (e.g., ICP-MS costing $500,000) and require highly trained operators. Even with these advanced instruments, accurate determination remains challenging. This paper addresses these challenges by proposing GEMTELLIGENCE, a deep learning-based solution for automated origin determination (OD) and treatment detection (TD) that aims to improve efficiency and accuracy while reducing costs.
Literature Review
Existing machine learning techniques in gemology have shown promise in automating certain tasks like geotagging and grading gemstones. These methods, however, often focus on single data types (images, spectra, or chemical compositions) and require significant human expertise for feature extraction and algorithm design. There is a need for a more comprehensive approach that integrates multiple data sources and minimizes reliance on manual feature engineering, hence the proposal of GEMTELLIGENCE as a novel multimodal deep learning approach.
Methodology
GEMTELLIGENCE is a multimodal deep learning model designed to process data from FTIR, UV-Vis, XRF, and ICP-MS instruments. The architecture combines convolutional neural networks (CNNs) for processing spectral data (FTIR and UV-Vis) and a transformer-based network for handling tabular data (XRF and ICP-MS). The CNNs, inspired by previous work (Ho et al.), use modified architectures with larger kernel sizes to capture global features. The transformer component, adapted from existing architectures, processes tabular data efficiently. All elements are integrated into a single model for end-to-end multimodal training. The model allows users to control the trade-off between automation and accuracy via a confidence-thresholding procedure. Missing data sources can be masked during inference. Predictions above a confidence threshold are accepted as automated classifications, while those below the threshold are flagged for expert review. The study focuses on blue sapphires, a challenging gemstone type. The dataset comprises over 5500 blue sapphire measurement records from the Gübelin Gem Lab, spanning seven years. Five-fold cross-validation was used to evaluate the model's performance. Ground truth determination was based on rigorous methods used at the Gübelin Gem Lab, combining visual, spectroscopic, and chemical analyses. Only stones with consistent OD and TD from multiple expert assessments and alignment between ICP-MS results and visual inspection were included. The evaluation compares GEMTELLIGENCE's performance against human gemologists using various combinations of data sources (for fairness, data sources not used in ground truth determination for a specific task were excluded from the comparison for that task).
Key Findings
GEMTELLIGENCE demonstrates comparable or superior performance to human experts across origin determination (OD) and treatment detection (TD) tasks. The model achieves high accuracy (e.g., >99% for OD and TD in certain configurations). The use of a confidence threshold allows control over the level of automation and accuracy. In OD, ICP-MS data leads to the highest accuracy; however, combining UV and XRF data provides comparable results, demonstrating the potential for cost reduction. In TD, integrating UV and FTIR data offers the best performance; however, GEMTELLIGENCE achieves high accuracy with FTIR data alone, suggesting its effectiveness even with limited data sources. Ablation studies reveal strong correlations between confidence and accuracy. Analysis shows consistent predictions across multiple evaluations of the same gemstone over time. Table 1 presents calibration accuracy, number of stones confidently classified, and test accuracy for different operating modes (different thresholds). Figure 2 compares the performance of GEMTELLIGENCE with human experts. Figure 3 illustrates the accuracy-automation trade-off with different data source combinations. Figure 4 demonstrates prediction consistency across multiple evaluations of the same stone over time. Supplementary information provides additional details on specific class performance, data sources, and detailed examples of GEMTELLIGENCE's application.
Discussion
GEMTELLIGENCE's success in matching or exceeding the performance of human experts while utilizing less expensive data sources highlights its significant potential for the gemstone industry. The ability to combine multiple data sources improves accuracy and provides a more robust and reliable assessment. The confidence-thresholding mechanism offers a practical way to balance automation and accuracy, adapting to the specific needs and risk tolerance of various applications. The consistent results over time underscore the reliability and stability of the model. The findings indicate that GEMTELLIGENCE can significantly reduce the time and cost associated with gemstone classification, enabling laboratories to process a larger volume of stones efficiently and freeing human experts to focus on more complex tasks. The public availability of the code and part of the dataset facilitates further research and independent validation.
Conclusion
GEMTELLIGENCE provides a highly accurate and cost-effective solution for automated gemstone classification, surpassing or matching human expert capabilities. Its multimodal design and confidence-thresholding procedure offer versatility and control over the level of automation. Future work could focus on expanding the dataset to include more gemstone types and improving the handling of non-metamorphic stones. The model's success suggests potential applications beyond gemology in other material science domains that utilize similar analytical techniques.
Limitations
GEMTELLIGENCE's current version is limited to metamorphic blue sapphires due to dataset limitations. The ground truth determination relies on Gübelin Gem Lab's expert assessments, which, while rigorous, might contain some inherent uncertainty. A pre-classifier could be used to filter non-metamorphic stones before GEMTELLIGENCE processing, but a more generalizable model is desirable. Future research should address these limitations by expanding the dataset and exploring methods to further reduce noise and potential biases in the ground truth labels.
Related Publications
Explore these studies to deepen your understanding of the subject.