Introduction
Morphological analysis is crucial in evolutionary and developmental biology for understanding the functional role of shape and its history. Traditional landmark-based geometric morphometrics, while widely used, suffers from limitations such as the inability to compare phylogenetically distant species or developmental stages where homologous landmarks are undefined, sensitivity to landmark number and annotation errors, and its linear nature. Elliptic Fourier analysis (EFA) offers a landmark-free alternative but shares some limitations. Deep neural networks (DNNs), especially in image classification and medical imaging, have shown promise, but their application in morphological feature extraction has been limited due to potential interpretability issues. This research proposes a landmark-free method using a modified VAE, called Morpho-VAE, to address these challenges. Morpho-VAE integrates a classifier module into a standard VAE architecture. This hybrid model uses both unsupervised learning (VAE for dimensionality reduction and reconstruction) and supervised learning (classifier module for distinguishing labeled classes) simultaneously, aiming to extract morphological features that best differentiate predefined classes while preserving image information during dimensionality reduction. The authors hypothesize that this approach will provide a more powerful and interpretable method for shape analysis, particularly useful for analyzing large image datasets of complex biological shapes.
Literature Review
The paper reviews existing methods for shape analysis, highlighting the strengths and weaknesses of landmark-based geometric morphometrics and elliptic Fourier analysis (EFA). Landmark-based methods are discussed as being dependent on expert annotation and susceptible to errors introduced during manual landmark placement. EFA, as a landmark-free alternative, is presented, but its limitations are also noted. The authors acknowledge the growing use of deep neural networks (DNNs) in image analysis and medical imaging, emphasizing their potential for handling nonlinear relationships in morphological data. However, the lack of interpretability in DNNs is acknowledged as a limitation. The review concludes by positioning the proposed Morpho-VAE as a novel approach combining the strengths of VAEs (dimensionality reduction and reconstruction) with supervised learning for improved interpretability and feature extraction specifically designed for morphometric applications. Prior work involving hybrid VAE architectures in other fields such as dementia classification and multimodal anomaly detection is cited to contextualize this approach within a broader machine learning landscape.
Methodology
The study utilized three-dimensional computed tomography (CT) scan data of primate mandibles from various sources (Primate Research Institute, MorphoSource.org, Mammalian Cranial Photographic Archive Second Edition). Phocidae (carnivores) served as an outgroup. To reduce computational demands, 3D data were projected onto three orthogonal planes (x, y, z) generating three 2D images per mandible. Images were size-normalized before analysis. Morpho-VAE, the proposed model (Figure 1b), incorporates a VAE module (encoder and decoder using convolutional and deconvolutional neural networks) and a classifier module. The encoder reduces the high-dimensional image data into a low-dimensional latent variable (ζ), while the decoder reconstructs the input image from ζ. The classifier module uses ζ to predict the family label. The total loss function (Etotal) is a weighted sum of the VAE loss (EVAE) and the classification loss (EC), controlled by a hyperparameter α. This α was determined via cross-validation (Figure 1c), balancing reconstruction accuracy and classification performance. Hyperparameters of Morpho-VAE (number of layers, filters, activation functions, optimizer) were tuned using Optuna, an automated hyperparameter optimization tool. The latent space dimension was fixed at three. A double cross-validation procedure split the dataset into training, validation, and test sets. To quantify cluster separation, the Cluster Separation Index (CSI) was used. The Steel test was used to compare the classification accuracy of Morpho-VAE against PCA and VAE. The Score-CAM method was used to visualize important image regions for classification. Finally, the robustness of reconstruction was evaluated using artificially cropped images.
Key Findings
Morpho-VAE demonstrated superior performance in separating mandible images into their respective families compared to PCA and VAE (Figure 2). The cluster separation in Morpho-VAE's latent space (Figure 2a) showed well-separated clusters, exceeding those from PCA (Figure 2b) and VAE (Figure 2c). Quantitative analysis using CSI and classification accuracy with SVM confirmed the superior performance of Morpho-VAE over PCA and VAE (Figure 2d,e, Supplementary Figure 8). Even when trained on six families and tested on a seventh, Morpho-VAE exhibited better cluster separation than PCA and VAE (Supplementary Figure 8). There was no significant correlation between latent space distance and phylogenetic distance (Supplementary Figure 3). Morpho-VAE successfully reconstructed images from its latent space (Figure 3a,c). The reconstruction quality was high, with only a minor drop in classification accuracy compared to original images (Figure 3b). Score-CAM analysis (Figure 4a) revealed that the x-projection (lateral view) was most informative, focusing on the coronoid and condylar processes, crucial for mastication, varying amongst families (Figure 4a,b). Morpho-VAE showed robustness in reconstructing missing segments in artificially cropped images, especially for areas around the teeth and mandible tip (Figure 5). However, loss of the coronoid and condylar processes significantly affected reconstruction (Figure 5, Supplementary Figure 5), which is consistent with Score-CAM results. Genus-level analysis within the Cercopithecidae family also showed Morpho-VAE’s superior performance (Supplementary Figure 7).
Discussion
The findings demonstrate Morpho-VAE's effectiveness as a landmark-free method for morphological feature extraction and shape analysis. The integration of supervised learning into the VAE architecture allowed for the extraction of features specifically relevant to distinguishing the predefined classes (families). The lack of correlation between latent space distance and phylogenetic distance highlights the influence of factors other than evolutionary history, such as dietary adaptations, on mandible morphology. The robustness of reconstruction from cropped images showcased Morpho-VAE's potential for analyzing incomplete or damaged samples. Score-CAM offered a valuable tool for interpreting the model's decision-making process, linking crucial anatomical features to classification. The success at both family and genus level analysis further validates the proposed methodology. This method may provide insights into morphological variation beyond purely phylogenetic relationships.
Conclusion
Morpho-VAE offers a powerful and versatile landmark-free method for morphological analysis, superior to traditional methods like PCA and VAE. Its ability to extract informative features, reconstruct missing segments, and provide interpretable results makes it a valuable tool for biologists. Future studies should compare Morpho-VAE to landmark-based methods directly, explore its application to 3D data, and investigate its effectiveness with larger datasets and closely related taxa using advanced embedding techniques like triplet loss. Further investigations into the interplay of genetics, environment, and morphology could be facilitated by this approach.
Limitations
The study used a relatively small dataset, which could affect the generalizability of the findings. The reliance on 2D projections of 3D data might have caused loss of information. While size normalization was implemented, other factors like sex differences might influence morphology and were not fully addressed due to limited dataset size and size normalization affecting the identification of sex specific differences. The interpretability provided by Score-CAM, while helpful, does not fully explain the complex interactions captured by the deep learning model.
Related Publications
Explore these studies to deepen your understanding of the subject.