logo
ResearchBunny Logo
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

Computer Science

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

J. Xu, S. Liu, et al.

Discover ODISE, an innovative open-vocabulary panoptic segmentation model that outperforms previous benchmarks in both panoptic and semantic segmentation tasks. This exciting research, conducted by Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, and Shalini De Mello, showcases the potential of text-to-image diffusion models in enhancing semantic representations for diverse categories.

00:00
00:00
Playback language: English
Abstract
This paper introduces ODISE, an open-vocabulary panoptic segmentation model that combines pre-trained text-to-image diffusion and discriminative models. It leverages the frozen internal representations of these models to perform segmentation of any category in the wild. ODISE surpasses prior state-of-the-art results on both open-vocabulary panoptic and semantic segmentation tasks, achieving significant improvements on datasets like ADE20K with COCO training. The model's performance highlights the potential of text-to-image diffusion models in learning rich semantic representations for open-vocabulary recognition tasks.
Publisher
arXiv preprint
Published On
Aug 24, 2023
Authors
Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello
Tags
ODISE
open-vocabulary
panoptic segmentation
text-to-image diffusion
semantic representation
machine learning
computer vision
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny