REP

Abstract

Learning is an iterative process that requires multiple forms of interaction with the environment. During learning, we experience the world through the repetition of observations and actions, gaining an insight into which combination of these leads to the best results, according to our goals. The same paradigm has been applied to traditional reinforcement learning (RL) over the years, with impressive results in 3D navigation and planning. On the other hand, the computer vision community has been focusing mostly on vision-related tasks (e.g. classification, segmentation, depth estimation) using deep learning (DL).

We present REP: Render, Encode, Plan, a unified framework to train embodied agents of different kinds (humanoids, vehicles, and drones) inside Unreal Engine, showing how a combination of RL and DL can help to shape intelligent agents that can better sense the surrounding environment. The main advantage of our method is the combination of different sensory modalities, including game state observations and vision features, that allow the agents to share a similar structure in their observations and rewards, while defining separate rewards based on their goals. We demonstrate impressive generalization capabilities on large-scale realistic 3D environments and on multiple dynamically changing scenarios, with different goals and rewards.

Architecture

Simplified schematic of our REP architecture. The goal here is to map physical and visual observations from multiple kinds of agents from a big, open world, into a meaningful set of actions. We do so by first creating an encoding of the visual observations using a DNN encoder. Then, we concatenate the normalized physical observations (i.e. game state) with the newly created visual observations. We then train the agents to optimize a policy that produces the best set of actions for a given reward function, using the PPO algorithm. At inference time, the same diagram applies, but the computation of the visual features runs in-engine using NNE for optimal performance.

@article{dellapietra2025render, title={Render, Encode, Plan: a simple pipeline for hybrid RL-DL learning inside Unreal Engine}, author={Della Pietra, Daniele and Garau, Nicola}, year={2025}, publisher={Computers & Graphics} }

Render, Encode, Plan: a simple pipeline for hybrid RL-DL learning inside Unreal Engine

Demonstration videos

Urban environment

Drone navigation

Agent exploration

Abstract

Architecture

State-of-the-art comparison

World

Results: drone navigation

Results: crowd simulation

Related Links

BibTeX