Learning Without Simulations? UC Berkeley’s DayDreamer Establishes a Strong Baseline for Real-World Robotic Training

Using reinforcement learning (RL) to train robots directly in real-world environments has been considered impractical due to the huge amount of trial and error operations typically required before the agent finally gets it right. The use of deep RL in simulated environments has thus become the go-to alternative, but this approach is far from ideal, as it requires designing simulated tasks and collecting expert demonstrations. Moreover, simulations can fail to capture the complexities of real-world environments, are prone to inaccuracies, and the resulting robot behaviours will not adapt to real-world environmental changes.

The Dreamer algorithm proposed by Hafner et al. at ICLR 2020 introduced an RL agent capable of solving long-horizon tasks purely via latent imagination. Although Dreamer has demonstrated its potential for learning from small amounts of interaction in the compact state space of a learned world model, learning accurate real-world models remains challenging, and it was unknown whether Dreamer could enable faster learning on physical robots.

In the new paper DayDreamer: World Models for Physical Robot Learning, Hafner and a research team from the University of California, Berkeley leverage recent advances in the Dreamer world model to enable online RL for robot training without simulators or demonstrations. The novel approach achieves promising results and establishes a strong baseline for efficient real-world robot training.

Blog