MyDreamerv2

A reimplementation and extension of DreamerV2 with exploration via Plan2Explore and selected improvements from DreamerV3

MyDreamerV2 is a PyTorch reimplementation of DreamerV2 that focuses on understanding and extending world-model–based reinforcement learning. The project includes a faithful reproduction of the original DreamerV2 pipeline, an explicit implementation of Plan2Explore for intrinsic motivation, and selected architectural and training improvements inspired by DreamerV3.

Left: a visualization of latent states learned by the Recurrent State-Space Model (RSSM), illustrating how the world model compresses observations into a compact predictive representation. Right: the environment used for training.
Episode return over training, showing a asymptotic improvement as the agent learns and plans within the learned world model.

Beyond reproducing the baseline DreamerV2 results, this implementation emphasizes exploration through uncertainty. Plan2Explore is implemented on top of the learned latent dynamics, encouraging the agent to seek trajectories where the world model is uncertain.

</div>

Left: DreamerV2 architecture, consisting of a learned world model (RSSM), actor, and critic trained entirely in latent space. Right: the Plan2Explore module, which adds an intrinsic reward based on ensemble disagreement in latent dynamics to drive efficient exploration.

For implementation details, design choices, and experimental results, see the full repository:
https://github.com/PedroTajia/MyDreamerV2

References