logo
ResearchBunny Logo
Diffusion for World Modeling: Visual Details Matter in Atari

Computer Science

Diffusion for World Modeling: Visual Details Matter in Atari

E. Alonso, A. Jelley, et al.

DIAMOND (DIffusion As a Model Of eNvironment Dreams) trains reinforcement learning agents inside a diffusion-based world model that preserves richer visual detail, boosting performance and safety. It achieves a new world-model best on Atari-100k with a mean human-normalized score of 1.46 and can function as an interactive neural game engine after training on Counter-Strike: Global Offensive. The research was conducted by Authors present in <Authors> tag. Code, agents, videos and playable models are released at https://diamond-wm.github.io... show more
Abstract
World models constitute a promising approach for training reinforcement learning agents in a safe and sample-efficient manner. Recent world models predominantly operate on sequences of discrete latent variables to model environment dynamics. However, this compression into a compact discrete representation may ignore visual details that are important for reinforcement learning. Concurrently, diffusion models have become a dominant approach for image generation, challenging well-established methods modeling discrete latents. Motivated by this paradigm shift, we introduce DIAMOND (DIffusion As a Model Of eNvironment Dreams), a reinforcement learning agent trained in a diffusion world model. We analyze the key design choices that are required to make diffusion suitable for world modeling, and demonstrate how improved visual details can lead to improved agent performance. DIAMOND achieves a mean human normalized score of 1.46 on the competitive Atari 100k benchmark; a new best for agents trained entirely within a world model. We further demonstrate that DIAMOND's diffusion world model can stand alone as an interactive neural game engine by training on static Counter-Strike: Global Offensive gameplay. To foster future research on diffusion for world modeling, we release our code, agents, videos and playable world models at https://diamond-wm.github.io
Publisher
38th Conference on Neural Information Processing Systems (NeurIPS 2024)
Published On
Authors
Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, François Fleuret
Tags
diffusion world model
reinforcement learning
world models
visual fidelity
Atari 100k benchmark
discrete latent variables
neural game engine
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny