This paper introduces ODISE, an open-vocabulary panoptic segmentation model that combines pre-trained text-to-image diffusion and discriminative models. It leverages the frozen internal representations of these models to perform segmentation of any category in the wild. ODISE surpasses prior state-of-the-art results on both open-vocabulary panoptic and semantic segmentation tasks, achieving significant improvements on datasets like ADE20K with COCO training. The model's performance highlights the potential of text-to-image diffusion models in learning rich semantic representations for open-vocabulary recognition tasks.