logo
Loading...
Operator World Models for Reinforcement Learning
Computer ScienceNeurIPS 2024 (Conference on Neural Information Processing Systems)

Operator World Models for Reinforcement Learning

P. Novelli, M. Pontil, et al.

Policy Mirror Descent (PMD) is powerful but hard to apply in Reinforcement Learning because action-value functions are not directly accessible. This work learns a world model via conditional mean embeddings and—using operator-theoretic matrix operations—derives closed-form action-value estimates. Combining these with PMD yields POWR, an RL algorithm with proven global convergence; this research was conducted by Pietro Novelli, Massimiliano Pontil, Marco Pratticò, and Carlo Ciliberto.... show more
Citation Metrics
Citations
0
Influential Citations
0
Reference Count
45

Note: The citation metrics presented here have been sourced from Semantic Scholar and OpenAlex.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 22+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny