logo
ResearchBunny Logo
Abstract
This paper introduces ODISE, an open-vocabulary panoptic segmentation model that combines pre-trained text-to-image diffusion and discriminative models. It leverages the frozen internal representations of these models to perform segmentation of any category in the wild. ODISE surpasses prior state-of-the-art results on both open-vocabulary panoptic and semantic segmentation tasks, achieving significant improvements on datasets like ADE20K with COCO training. The model's performance highlights the potential of text-to-image diffusion models in learning rich semantic representations for open-vocabulary recognition tasks.
Publisher
arXiv preprint
Published On
Aug 24, 2023
Authors
Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello
Tags
ODISE
open-vocabulary
panoptic segmentation
text-to-image diffusion
semantic representation
machine learning
computer vision
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny