logo
ResearchBunny Logo
Enhancing Object Detection Robustness: A Synthetic and Natural Perturbation Approach

Computer Science

Enhancing Object Detection Robustness: A Synthetic and Natural Perturbation Approach

N. Premakumara, B. Jalaian, et al.

Discover the cutting-edge research by Nilantha Premakumara, Brian Jalaian, Niranjan Suri, and Hooman Samani, which explores how synthetic perturbations can boost the robustness of object detection models against real-world challenges such as varying lighting and blur. This study sheds light on how these advancements can lead to more reliable detection systems.

00:00
00:00
Playback language: English
Introduction
Object detection, a crucial computer vision task, aims to identify and locate objects within images or videos. Its applications span various fields, including object tracking, activity recognition, image captioning, image segmentation, and visual question answering. The challenge lies in handling high intra-class and low inter-class variance. While deep learning has significantly advanced object detection, real-world deployments face the issue of inconsistent image quality. Therefore, evaluating model robustness against distribution shifts is paramount. Existing robustness interventions include non-conventional architectures, data augmentation, alternative losses, and optimizers, but these often target specific shift types. This research presents a novel approach to assessing robustness by comparing model performance against both synthetic and real natural perturbations. Four state-of-the-art models were evaluated using the COCO 2017 dataset for training and the ExDark dataset for evaluating robustness to real-world perturbations. The AugLy package was utilized to simulate synthetic natural perturbations. A comprehensive ablation study, involving retraining models on synthetically perturbed data, examined the transferability of improvements to real-world scenarios. The primary contributions are a systematic exploration of optimal synthetic perturbation levels for robustness enhancement and a meticulous ablation study establishing a connection between synthetic augmentation and real-world robustness.
Literature Review
Robustness of CNN-based object detection to noise is a key research area, particularly for surveillance where image quality can be poor. Studies have explored the impact of synthetic rain, fog, and other degradations, showing significant effects on classification and detection accuracy. Approaches include adding realistic snow and fog to datasets for scene reconstruction evaluation, and assessing classifier performance under natural perturbations, revealing substantial accuracy drops and localization errors. Research has also focused on object detection in thermal surveillance under varied weather conditions. Semantic adversarial editing is another technique to generate believable corruptions and highlight challenging data points for model robustness improvement. Challenging datasets like ExDARK, UNIRI-TID, RESIDE, UFDD, and See in the Dark have been developed to address various adverse conditions (low light, weather, occlusions). Data augmentation, including various types of noise, deep artificial image transformations, natural transformations, and simple image transformations, have been shown to enhance model resilience. This work utilizes the AugLy package, a state-of-the-art open-source library offering numerous data augmentations mimicking real-world photo and video manipulations, for image perturbation. Existing work highlights the need for robust deep learning models handling natural data variance and corruptions. This research builds upon these findings, further investigating the impact of synthetic perturbations on model performance and the benefits of transfer learning and retraining using synthetic perturbations to improve robustness to real perturbations.
Methodology
The study evaluated model robustness using four pretrained state-of-the-art neural network models (Detr-ResNet-101, Detr-ResNet-50, YOLOv4, and YOLOv4-tiny), the COCO 2017 dataset for training and synthetic perturbation generation, and the ExDark dataset for evaluating robustness to real-world perturbations. Three types of synthetic perturbations (blur, brightness, and pixel degradation) were introduced using the AugLy package, with perturbation parameters adjusted to simulate natural variations. The ablation study focused on brightness, varying the number of synthetic perturbation images in the training data (0%, 20%, 50%, 70%) to investigate its effect on performance with real-world perturbations from ExDark. The study aimed to identify the optimal level of synthetic perturbation for improving model robustness and to analyze the transferability of improvements from synthetic to real perturbations. Mean Average Precision (mAP) was used as the primary performance metric. The ablation study meticulously assessed the effect of different ratios of original to synthetically perturbed images in the training set on the mAP scores, comparing the performance on both the COCO and ExDARK datasets. Models were evaluated on their performance with synthetic perturbations first, and then after retraining with the augmented datasets, their performance was evaluated again on the ExDark dataset.
Key Findings
Initial evaluation against synthetic perturbations revealed that most models were susceptible to strong brightness, while all were more robust to darkness. Detr-ResNet-101 and Detr-ResNet-50 showed similar performance trends, with Detr-ResNet-101 slightly better. YOLOv4 consistently outperformed YOLOv4-tiny. Experiments with varying perturbation levels showed performance degradation beyond specific thresholds for each perturbation type. Detr-ResNet-101 demonstrated the highest robustness. The ablation study showed an increasing trend in mAP as the percentage of synthetically perturbed images in the training data increased, indicating improved robustness to real-world (ExDark) perturbations. The gap between the mAP scores of different models narrowed with the increased proportion of synthetically perturbed data in training, suggesting that synthetic perturbations effectively improved generalization to real-world conditions. Specifically, Detr-ResNet-101 consistently showed the best performance across different synthetic perturbation ratios in both the COCO and ExDark evaluations. Table 2 presents the mAP scores for various models with synthetic corruptions, and Table 3 shows the mAP scores after retraining with different synthetic:original ratios. Figure 1 illustrates how confidence scores change with different perturbation levels. Figures 2, 3 and 4 illustrate the performance of different models with varied synthetic perturbation levels and dataset ratios.
Discussion
The findings confirm that augmenting training data with synthetic perturbations, particularly focusing on brightness variations, significantly enhances the robustness of object detection models against real-world distribution shifts, especially those characterized by challenging lighting conditions as seen in the ExDark dataset. The systematic exploration of optimal perturbation levels provides practical guidelines for data augmentation strategies. The improved performance on the ExDark dataset, after retraining with synthetically perturbed data, demonstrates the transferability of robustness improvements from synthetic to real-world scenarios. While Detr-ResNet-101 consistently outperformed other models, the optimal choice might depend on factors such as computational resources and specific application requirements. The narrowing gap in performance between different models with increased synthetic data highlights the potential of synthetic data augmentation to reduce performance disparities between models, thereby improving the overall robustness of object detection systems.
Conclusion
This study demonstrates the effectiveness of synthetic perturbation-based data augmentation in improving the robustness of object detection models against real-world distribution shifts. Detr-ResNet-101 exhibited superior robustness. Future work should involve testing on datasets with diverse natural perturbations, evaluating a wider range of models, expanding the ablation study to other perturbation types, and exploring alternative data augmentation techniques and parameter optimization.
Limitations
The study's limitations include the use of only the ExDark dataset, focusing primarily on poor lighting conditions; the evaluation of only four pre-trained models; the focus on brightness adjustments in the ablation study; and the potential impact of augmentation technique and parameter choices on the results. Future work should address these limitations by expanding the scope of datasets, models, perturbation types, and augmentation strategies.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny