Computer SciencearXiv preprint

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

J. Xu, S. Liu, et al.

Discover ODISE, an innovative open-vocabulary panoptic segmentation model that outperforms previous benchmarks in both panoptic and semantic segmentation tasks. This exciting research, conducted by Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, and Shalini De Mello, showcases the potential of text-to-image diffusion models in enhancing semantic representations for diverse categories.... show more

General Summary Metrics

Abstract

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. Text-to-image diffusion models have the remarkable ability to generate high-quality images with diverse open-vocabulary language descriptions, indicating their internal representation space is highly correlated with open concepts in the real world. Text-image discriminative models like CLIP are good at classifying images into open-vocabulary labels. We leverage the frozen internal representations of both these models to perform panoptic segmentation of any category in the wild. Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks. In particular, with COCO training only, our method achieves 23.4 PQ and 30.0 mIoU on the ADE20K dataset, with 8.3 PQ and 7.9 mIoU absolute improvement over the previous state of the art. We open-source our code and models at https://github.com/NVlabs/ODISE.

Publisher

arXiv preprint

Published On

Aug 24, 2023

Authors

Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello

DOI

https://doi.org/10.1109/cvpr52729.2023.00289

Explore these studies to deepen your understanding

Adjacent work that informs or extends this paper's methodology and findings.

Engineering and Technology

OPEN Wireless localization with diffusion maps

A. Ghafourian, O. Georgiou, et al.

Psychology

A Body Compassion Intervention on Body Image to Improve Quality of Life in Women With a History of Breast Cancer

V. Sebri, I. Durosini, et al.

Computer Science

Persuading large language models to comply with objectionable requests

L. Meincke, D. Shapiro, et al.

Chemistry

Structured information extraction from scientific text with large language models

J. Dagdelen, A. Dunn, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 22+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny