Introduction
Archaeology relies heavily on classifying artifacts to understand past societies. This process is time-consuming and subjective, relying on the expertise of archaeologists. While machine learning has been applied to archaeology before, previous attempts using handcrafted features yielded poor results. Deep learning, specifically Convolutional Neural Networks (CNNs), offers a powerful tool for automatic feature extraction from images, showing promise in various computer vision tasks. However, existing applications of CNNs in archaeology have focused on narrow material and contextual ranges, failing to address the breadth of archaeological data. This paper aims to develop a CNN model capable of handling the vast temporal and cultural diversity present in archaeological records. Utilizing a large, publicly accessible dataset of artifact photographs from the Israel Antiquities Authority, spanning 1.4 million years of Levantine history, the researchers sought to train a CNN to classify artifacts by period and site. The study further investigates the model's potential to detect communities—groups of classes with shared characteristics—as a novel method for identifying meaningful connections between sites.
Literature Review
Previous attempts to apply machine learning to archaeological classifications relied on hand-crafted feature extraction, resulting in limited success. More recent approaches have incorporated algorithms for automatic feature extraction, showing some progress in specific tasks such as ceramic classification, lithic assemblage dating, and bone surface modification differentiation. However, these studies typically focused on limited material and contextual ranges, hindering their broader applicability in archaeology. This research builds upon these efforts by using a deep learning approach to tackle the full diversity of archaeological contexts.
Methodology
The study employed a deep convolutional neural network (CNN) based on the pre-trained EfficientNetB3 model, known for its high performance on ImageNet. Transfer learning was used, where the pre-trained model was adapted for the archaeological task. The original classification layers were removed, and a custom layer was added to classify artifacts into 200 categories (the 200 largest classes from the dataset). To improve robustness, data augmentation techniques (random rotations, spatial shifts, zoom, and horizontal flips) were applied. Five models were trained independently with the same ImageNet initialization, and their feature vectors were combined to create a final feature vector. The cosine similarity distance was used to measure the similarity between artifact features. For classification, the k-nearest neighbors approach (k=1) was used, comparing the query image’s features to those in the training set. A confusion matrix was generated, which served as the basis for community detection using the Louvain algorithm. This algorithm identifies communities (clusters of closely related classes) within the network represented by the confusion matrix. The weights of the edges in this network reflect the degree of confusion between classes. The researchers further refined the community detection by incorporating the ten nearest neighbors for each query image and by selectively applying the algorithm to specific temporal groups. An interactive application was developed to visualize the communities geographically.
Key Findings
The CNN model achieved high accuracy in predicting artifact period and site, comparable to or exceeding the performance of archaeologists when considering all periods. The model achieved Top-1 accuracy of 58.10% for period-site classification, 63.58% for site, 67.79% for period, 71.03% for fine-period grouping, and 76.36% for rough-period grouping in the validation set. The Top-5 accuracy was substantially higher (67.36%, 71.89%, 77.69%, 81.47%, and 85.41% respectively). Blind testing against archaeologists showed the model outperforming them on average accuracy across all periods (69.84% vs 44.44% and 20.63%). Experiments revealed that including site information during training significantly improved the model's overall performance. The community detection algorithm successfully identified meaningful connections between sites based on the model's confusion matrix. A case study focused on Natufian artifacts revealed communities that aligned with both visual similarity and archaeological significance, although some false positives were noted. The number of outliers within communities was shown to be a function of the range of periods considered in the community detection.
Discussion
The results demonstrate the potential of deep learning for predictive archaeology. The model's high accuracy in classifying artifacts surpasses the performance of individual archaeologists in some aspects, particularly in recognizing patterns across broad temporal ranges. The ability to detect communities of sites based on artifact similarities opens new avenues for exploring cultural interactions and understanding the spatial distribution of past societies. The model's capacity to identify unexpected connections between sites, even if some are potentially false positives, highlights the potential to uncover previously unrecognized patterns and inspire further investigation by archaeologists. The fact that site information improves the accuracy of period prediction suggests the importance of considering spatial context in chronological studies.
Conclusion
This research successfully developed a deep learning model for classifying archaeological artifacts and identifying meaningful communities. The model's accuracy and capacity to reveal previously unseen connections between sites have far-reaching implications for archaeological research. Future work should focus on expanding the dataset, incorporating more sophisticated community detection methods, and developing tools to aid in interpreting and validating the model's outputs. The model itself and the interactive application developed for visualizing communities, represent valuable tools for the archaeological community.
Limitations
The study's reliance on a single geographic region (the Southern Levant) limits the generalizability of the findings. The dataset, while large, may not fully represent the diversity of archaeological materials and contexts worldwide. The accuracy of the model is dependent on the quality and consistency of the initial data labels, which may be subjective and subject to expert interpretations. Although data augmentation was used, potential biases introduced through the image acquisition process might affect the model's performance. Some community detection results included outliers that require further archaeological interpretation. The model’s focus on visual features might overlook other important characteristics of artifacts that are not visually apparent.
Related Publications
Explore these studies to deepen your understanding of the subject.