Computer Science

Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search

N. Dainese, M. Alakuijala, et al.

This work presents Code World Models—world models generated as Python code by LLMs for model-based RL—alongside GIF-MCTS, a new code-generation strategy, and the Code World Models Benchmark (CWMB). GIF-MCTS outperforms baselines and yields models that enable planning with much better sample efficiency and faster inference. Research conducted by Nicola Dainese, Minttu Alakuijala, Matteo Merler, and Pekka Marttinen.

00:00

~3 min • Beginner • English

Index

Abstract

In this work we consider Code World Models, world models generated by a Large Language Model (LLM) in the form of Python code for model-based Reinforcement Learning (RL). Calling code instead of LLMs for planning has potential to be more precise, reliable, interpretable, and extremely efficient. However, writing appropriate Code World Models requires the ability to understand complex instructions, to generate exact code with non-trivial logic and to self-debug a long program with feedback from unit tests and environment trajectories. To address these challenges, we propose Generate, Improve and Fix with Monte Carlo Tree Search (GIF-MCTS), a new code generation strategy for LLMs. To test our approach in an offline RL setting, we introduce the Code World Models Benchmark (CWMB), a suite of program synthesis and planning tasks comprised of 18 diverse RL environments paired with corresponding textual descriptions and curated trajectories. GIF-MCTS surpasses all baselines on the CWMB and two other benchmarks, and we show that the Code World Models synthesized with it can be successfully used for planning, resulting in model-based RL agents with greatly improved sample efficiency and inference speed.

Publisher

38th Conference on Neural Information Processing Systems (NeurIPS 2024)

Published On

Authors

Nicola Dainese, Minttu Alakuijala, Matteo Merler, Pekka Marttinen

DOI

https://doi.org/10.48550/arxiv.2405.15383

Related Publications

Explore these studies to deepen your understanding of the subject.

Computer Science

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

J. Liu, S. Chunqiu, et al.

Computer Science

AI-AI bias: Large language models favor communications generated by large language models

W. Laurito, B. Davis, et al.

Computer Science

Accelerating materials language processing with large language models

J. Choi and B. Lee

Psychology

Automating psychological hypothesis generation with AI: when large language models meet causal graph

S. Tong, K. Mao, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny