Computer ScienceNeurIPS 2024 (Conference on Neural Information Processing Systems)
Operator World Models for Reinforcement Learning
P. Novelli, M. Pontil, et al.
Policy Mirror Descent (PMD) is powerful but hard to apply in Reinforcement Learning because action-value functions are not directly accessible. This work learns a world model via conditional mean embeddings and—using operator-theoretic matrix operations—derives closed-form action-value estimates. Combining these with PMD yields POWR, an RL algorithm with proven global convergence; this research was conducted by Pietro Novelli, Massimiliano Pontil, Marco Pratticò, and Carlo Ciliberto.
Related Publications
Explore these studies to deepen your understanding
Adjacent work that informs or extends this paper's methodology and findings.
Computer Science
Learning World Models for Unconstrained Goal Navigation
Y. Duan, W. Mao, et al.
Computer Science
The Goldilocks paradigm: comparing classical machine learning, large language models, and few-shot learning for drug discovery applications
S. H. Snyder, P. A. Vignaux, et al.
Physics
Realizing a deep reinforcement learning agent for real-time quantum feedback
K. Reuer, J. Landgraf, et al.
Education
Models of good teaching practices for mobile learning in higher education
J. Romero-rodríguez, I. Aznar-díaz, et al.

