logo
ResearchBunny Logo
Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search

Computer Science

Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search

N. Dainese, M. Alakuijala, et al.

This work presents Code World Models—world models generated as Python code by LLMs for model-based RL—alongside GIF-MCTS, a new code-generation strategy, and the Code World Models Benchmark (CWMB). GIF-MCTS outperforms baselines and yields models that enable planning with much better sample efficiency and faster inference. Research conducted by Nicola Dainese, Minttu Alakuijala, Matteo Merler, and Pekka Marttinen.

00:00
00:00
~3 min • Beginner • English
Abstract
In this work we consider Code World Models, world models generated by a Large Language Model (LLM) in the form of Python code for model-based Reinforcement Learning (RL). Calling code instead of LLMs for planning has potential to be more precise, reliable, interpretable, and extremely efficient. However, writing appropriate Code World Models requires the ability to understand complex instructions, to generate exact code with non-trivial logic and to self-debug a long program with feedback from unit tests and environment trajectories. To address these challenges, we propose Generate, Improve and Fix with Monte Carlo Tree Search (GIF-MCTS), a new code generation strategy for LLMs. To test our approach in an offline RL setting, we introduce the Code World Models Benchmark (CWMB), a suite of program synthesis and planning tasks comprised of 18 diverse RL environments paired with corresponding textual descriptions and curated trajectories. GIF-MCTS surpasses all baselines on the CWMB and two other benchmarks, and we show that the Code World Models synthesized with it can be successfully used for planning, resulting in model-based RL agents with greatly improved sample efficiency and inference speed.
Publisher
38th Conference on Neural Information Processing Systems (NeurIPS 2024)
Published On
Authors
Nicola Dainese, Minttu Alakuijala, Matteo Merler, Pekka Marttinen
Tags
Code World Models
model-based reinforcement learning
LLM-generated Python code
program synthesis
GIF-MCTS
Code World Models Benchmark (CWMB)
sample efficiency and inference speed
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny