Environment as Policy: Learning to Race in Unseen Tracks

*equal contribution

Abstract

Reinforcement learning (RL) has achieved outstanding success in complex robot control tasks, such as drone racing, where the RL agents have outperformed human champions in a known racing track. However, these agents fail in unseen track configurations, always requiring complete retraining when presented with new track layouts.

This work aims to develop RL agents that generalize effectively to novel track configurations without retraining. The naive solution of training directly on a diverse set of track layouts can overburden the agent, resulting in suboptimal policy learning as the increased complexity of the environment impairs the agent’s ability to learn to fly.

To enhance the generalizability of the RL agent, we propose an adaptive environment-shaping framework that dynamically adjusts the training environment based on the agent’s performance. We achieve this by leveraging a secondary RL policy to design environments that strike a balance between being challenging and achievable, allowing the agent to adapt and improve progressively. Using our adaptive environment shaping, one single racing policy efficiently learns to race in diverse and challenging tracks. Experimental results validated in both simulation and the real world show that our method enables drones to successfully fly complex and unseen race tracks, outperforming existing environment-shaping techniques.

Method

Method

Overview of the proposed method. In every N iteration, the environment policy (left) takes as input the information of the racing policy evaluations and the current environments. It generates actions to adjust the gate layouts independently for each parallel environment. The racing policy (right) utilizes the information about drone and gate states from these simulation environments to learn time-optimal drone racing strategies through an MLP.

Visualization

Method

Visualization of the drone racing tracks used for the experiments, each characterized by varying levels of complexity. All the tracks maintain a consistent size scale, spanning widths from 8 meters to 16 meters.

Real World Results

Figure 8

3D Figure 8

Kidney

3D Big S

Big S

Twist

Simulation

We can also animate the dynamic track racing experiment. Use the slider here to control a group of drones flying through a dynamic Figure 8 track.

Loading...