logo
ResearchBunny Logo
Introduction
Analyzing the three-dimensional social behaviors of freely moving large mammals is valuable in both agricultural and life science research. Pigs, in particular, are important subjects due to their relevance to agricultural production and their increasing use as animal models in biomedical research. Their behaviors, including locomotion, posture, environmental interactions, and social communication (e.g., tail movements reflecting emotional states), provide crucial insights into their well-being and health. Quantitative monitoring of these behaviors is essential for improving pig welfare and optimizing pork production. Furthermore, pigs offer advantages over rodent models in certain neurological disease research due to their closer genetic, anatomical, and physiological similarities to humans. Accurate 3D motion capture is therefore crucial for understanding neurobiological processes in models of movement-related disorders (Huntington's Disease, ALS, Parkinson's Disease) and cognition-related disorders (Alzheimer's Disease, Autism, depression). While 2D pig behavior recognition from monocular videos has been studied, a system for accurate, markerless 3D motion reconstruction and quantitative analysis has been lacking. Existing methods like SLEAP and DeepLabCut, which rely on sparse 2D keypoints, struggle with occlusions common in social interactions. Triangulation methods, while improving 3D reconstruction from multi-view 2D data, are challenged by associating unordered multi-view cues and handling occlusions. Regression-based methods require large, expensive 3D datasets for training. This research addresses these limitations by developing a new system.
Literature Review
The existing literature highlights the importance of understanding pig behavior for both agricultural and biomedical research. Studies using monocular video analysis have demonstrated progress in 2D behavior recognition, focusing on aspects like locomotion, posture, and feeding. However, these methods are limited by their inability to capture the three-dimensional nature of animal interactions and their susceptibility to occlusions. Traditional marker-based motion capture systems are cumbersome and invasive, restricting natural animal behavior. Existing markerless approaches, such as SLEAP and DeepLabCut, focus on 2D keypoint tracking and struggle with the complexities of multi-animal interactions and occlusions. Triangulation techniques attempt to overcome this by integrating information from multiple cameras, but they are computationally intensive and face challenges in correctly matching and reconstructing poses when occlusions are present. Recent regression-based methods have shown promise, but they generally require extensive labeled 3D datasets for training, making them expensive and time-consuming to implement.
Methodology
The researchers developed a novel Multi-Animal Mesh Model Alignment (MAMMAL) system for 3D surface motion capture of multiple freely moving pigs. MAMMAL consists of three stages: Detection, Detection Matching, and Mesh Fitting. A custom-designed articulated surface mesh model of a pig (the PIG model) with 11239 vertices and 62 joints was created. **Stage 1: Detection:** This stage uses two deep neural networks, PointRend (for silhouette detection) and HRNet (for keypoint detection), trained on a newly created dataset (BamaPig2D) of manually labeled pig images. The system first generates bounding boxes and silhouettes for each pig instance, normalizes image regions to a fixed resolution, and then detects visible keypoints. **Stage 2: Detection Matching:** This stage addresses the problem of associating unordered multi-view 2D cues. An innovative cross-view graph matching algorithm is used to match spatially unordered 2D cues at the initial time point. At subsequent time points, the PIG model facilitates tracking by identifying both visible and invisible keypoints. **Stage 3: Mesh Fitting:** This stage leverages the PIG model's surface information to handle occlusions. Matched 2D cues are aligned to the model to generate 3D keypoints and surface geometry. The occlusion relationship and surface information guide the filtering of erroneous data, improving the accuracy of the 3D reconstruction. The system optimizes joint rotations, pig scale, and 6 degrees of freedom to place individuals in 3D space. The optimization uses the Levenberg-Marquardt algorithm.
Key Findings
MAMMAL successfully captured the 3D surface motions of multiple pigs in their natural environment. The system accurately tracked both visible and invisible keypoints, with an average error of less than 5.2 cm. The average error for all keypoints was 3.44 cm, less than 5% of the pig's body length. MAMMAL demonstrated robustness to different pig sizes and camera numbers, performing better than traditional triangulation methods even with a reduced number of cameras. MAMMAL enabled quantitative analysis of various pig behaviors: * **Animal-Scene Interaction:** Automatic identification of drinking and feeding behaviors based on 3D motion and scene priors. * **Posture Discovery:** Identification of eight distinct pig postures from a large dataset of poses using t-SNE clustering. * **Social Behavior Recognition:** Recognition of both static and dynamic social interactions, including detailed part-level contacts (Head-Head, Head-Body, Head-Limb). Analysis of tail movements in pigs with different social hierarchies revealed that dominant pigs displayed more frequent loosely wagging tail behavior (associated with positive emotions) than subordinate pigs, while subordinate pigs exhibited more passive hanging (associated with aversion). Power spectral density (PSD) analysis confirmed this distinction. Comparisons with existing methods, such as DANNCE (for mice) and VoxelPose (for dogs), demonstrated the competitive performance and generalizability of MAMMAL across different species and environments.
Discussion
MAMMAL represents a significant advancement in markerless animal motion capture, addressing the limitations of previous methods in handling occlusions and reconstructing 3D poses for multiple interacting animals. Its ability to accurately track both visible and invisible keypoints, coupled with its robustness to variations in camera number and pig size, makes it a powerful tool for studying animal behavior. The quantitative analyses of various pig behaviors, including animal-scene interactions, posture, and social interactions, demonstrate the system's versatility. The findings on tail movements in pigs of different social hierarchies offer valuable insights into social dynamics and emotional states. The successful application of MAMMAL to mice and dogs underscores its generalizability and potential for broader application in animal behavior research across species. The combination of an articulated mesh model and sophisticated algorithms enables accurate reconstruction in complex scenarios where occlusions are common.
Conclusion
MAMMAL is the first system to provide non-invasive 3D surface motion capture of multiple freely moving animals. Its key advantages lie in its ability to handle invisible keypoints and severe occlusions, particularly relevant for large-size animals. While not yet real-time, MAMMAL provides quantitative behavioral analysis applicable to disease modeling, drug evaluation, and brain circuit research. Its application to pigs, mice, and dogs demonstrates its generalizability and potential for use in various animal models. Future improvements could include real-time processing, integration of other modalities, and enhanced model flexibility.
Limitations
Currently, MAMMAL is not a real-time system. The accuracy of the system relies heavily on the quality of the input data (camera calibration, image quality) and the manual creation of the articulated mesh model for each species. The generalization of the model to other animal species requires adapting and training the system with labeled data for those species, which takes time and expertise. The analysis of complex behaviors might require further refinement and annotation of behavior classes. Finally, the algorithm's performance on very densely packed animals or environments with extreme occlusions may require further optimization.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny