logo
ResearchBunny Logo
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

Computer Science

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

J. Xu, S. Liu, et al.

Discover ODISE, an innovative open-vocabulary panoptic segmentation model that outperforms previous benchmarks in both panoptic and semantic segmentation tasks. This exciting research, conducted by Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, and Shalini De Mello, showcases the potential of text-to-image diffusion models in enhancing semantic representations for diverse categories.... show more
Abstract
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. Text-to-image diffusion models have the remarkable ability to generate high-quality images with diverse open-vocabulary language descriptions, indicating their internal representation space is highly correlated with open concepts in the real world. Text-image discriminative models like CLIP are good at classifying images into open-vocabulary labels. We leverage the frozen internal representations of both these models to perform panoptic segmentation of any category in the wild. Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks. In particular, with COCO training only, our method achieves 23.4 PQ and 30.0 mIoU on the ADE20K dataset, with 8.3 PQ and 7.9 mIoU absolute improvement over the previous state of the art. We open-source our code and models at https://github.com/NVlabs/ODISE.
Publisher
arXiv preprint
Published On
Aug 24, 2023
Authors
Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello
Tags
ODISE
open-vocabulary
panoptic segmentation
text-to-image diffusion
semantic representation
machine learning
computer vision
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny