logo
ResearchBunny Logo
Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets

Computer Science

Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets

K. T. R. Voo, L. Jiang, et al.

Discover the innovative research by Kenny T. R. Voo, Liming Jiang, and Chen Change Loy that delves into occlusion-aware face segmentation. The study introduces groundbreaking techniques and high-resolution datasets that are essential for advancing applications in computer vision.

00:00
00:00
Playback language: English
Introduction
Occlusion-aware face segmentation, the task of extracting face pixels from an occluded face, is critical for various face-related applications like face recognition, face swapping, and facial reconstruction. Existing real-world datasets are often limited in quantity, resolution, or accuracy of labeling. While previous studies have used synthetic data generation, these often lack naturalism. The color, texture, and edges of occluders in synthetic datasets frequently appear unnatural. Existing methods also tend to focus on specific occluders (e.g., sunglasses, masks) placed at fixed locations, limiting their generalizability. The commonly used validation set, COFW, also has limitations as not all faces are occluded. This paper addresses these issues by proposing new synthetic data generation techniques and high-quality real-world datasets to facilitate better model evaluation and benchmark creation.
Literature Review
The paper reviews existing real-world and synthetic occluded face datasets. Real-world datasets like ROF, ISL-UFMD, FMLD, SoF, and ISL-UFHD suffer from limitations such as low quantity, low resolution, inaccurate labels, or lack of annotated masks. CelebAMask-HQ, while large and high-quality, contains approximately 4,000 occluded faces with incorrect annotations. The COFW dataset, while previously used as a benchmark, is no longer fully publicly available. Synthetic datasets generated by previous studies often overlay specific occluders in a non-naturalistic manner, lacking generalizability. Extended Labeled Faces in-the-Wild (ELFW) and a concurrent work, FaceOcc, are mentioned, but their limitations in terms of resolution, quantity, or data availability are discussed.
Methodology
The paper proposes two data generation techniques: 1. **Naturalistic Occlusion Generation (NatOcc):** This method generates naturalistic synthetic occluded faces from CelebAMask-HQ using color transfer (via Sliced Optimal Transport), image harmonization (using RainNet), and super-resolution. Preprocessing steps are used to handle black pixel imbalances during color transfer. Affine augmentations, image compression, random brightness and contrast adjustments, and Gaussian blurring are applied to enhance naturalism. Specific attention is paid to the edges of occluders for a more realistic composite image. 2. **Random Occlusion Generation (RandOcc):** This method overlays random shapes with random textures (from the Describable Textures Dataset) and transparency onto faces. It is a more general approach compared to NatOcc. **Dataset Preparation:** The authors manually corrected approximately 4,000 incorrectly labeled occluded faces in CelebAMask-HQ, splitting the dataset into occluded and non-occluded categories. They also use images from COCO and 11k Hands datasets for occluders, manually annotating the hands. Two new validation datasets are introduced: RealOcc (550 aligned and cropped faces) and RealOcc-Wild (270 unaligned, in-the-wild images) for robustness testing. The COFW dataset masks are manually modified to align with the paper's occlusion definition. **Experimental Setup:** The authors train PSPNet, DeepLabv3+, and SegFormer models on various combinations of their synthetic and real-world datasets. Evaluation metrics include mIoU on RealOcc and RealOcc-Wild, and COFW (training set).
Key Findings
The key findings of the paper are: 1. **Effectiveness of NatOcc and RandOcc:** Models trained with the NatOcc dataset (especially with color transfer using Sliced Optimal Transport) performed comparably to or better than those trained on the real-world occluded dataset (C-CM), showcasing the effectiveness of the proposed naturalistic synthetic data generation. RandOcc also showed significant improvement over datasets without occlusions and performed close to C-CM. 2. **Generalization to Unseen Occlusions:** Models trained with NatOcc and RandOcc datasets generalized well to unseen occlusions (e.g., glasses, face masks) in the RealOcc-Wild dataset. This highlights the robustness of the methods. 3. **Complementary Nature of NatOcc and RandOcc:** A combined dataset (C-WO-Mix) using both NatOcc and RandOcc achieved even higher performance, suggesting that the two methods are complementary. The mix was particularly effective in detecting transparent/translucent objects which individual methods struggled with. 4. **Impact of Data Quality:** The experiment using the original, incorrectly annotated CelebAMask-HQ dataset (C-Original) showed significantly poorer performance compared to even the non-occluded version, highlighting the importance of clean data. Class imbalance related to background pixels was also a significant factor influencing mIoU scores. 5. **Robustness:** The models trained on synthetic datasets demonstrated improved robustness when tested on real-world occluded faces in the wild, as measured by mIoU on COFW and RealOcc-Wild. 6. **No Negative Impact on Non-occluded Faces:** The improved performance on occluded faces did not negatively impact segmentation accuracy on non-occluded faces, as evidenced by tests on the CelebAMask-HQ-WO (Test) dataset.
Discussion
The results demonstrate that the proposed NatOcc and RandOcc methods are effective in generating high-quality synthetic data for occlusion-aware face segmentation. The ability of the models trained on these synthetic datasets to generalize to unseen occlusions underscores the robustness and generalizability of the proposed techniques. The complementary nature of NatOcc and RandOcc further strengthens the case for using both approaches. The findings highlight the crucial importance of data quality and suggest that class imbalance needs to be addressed in future evaluation metrics. The study provides valuable insights into creating high-quality synthetic datasets for computer vision tasks and the challenges of handling class imbalances.
Conclusion
This paper presents two novel synthetic occlusion generation methods (NatOcc and RandOcc) that produce high-quality, naturalistic occluded face data. The authors contribute corrected annotations and new categories to CelebAMask-HQ and introduce two new real-world datasets (RealOcc and RealOcc-Wild). The results show improved performance and robustness compared to using real-world data alone. Future work could focus on even more advanced synthetic data generation techniques and exploring alternative evaluation metrics that mitigate the impact of class imbalance. Applications of this improved occlusion-aware face segmentation in real-world face-related tasks should be investigated further.
Limitations
The study's limitations include potential biases introduced by the specific datasets used (CelebAMask-HQ, COCO, 11k Hands). The generalization ability to occlusions outside the scope of the training data could be further tested with a more diverse set of real-world occlusions. The paper acknowledges that the mIoU metric can be misleading due to class imbalance, and alternative metrics should be considered. The impact of specific hyperparameter choices in the training process was not explored exhaustively.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny