Robotics and Perception Group

Welcome to the website of the Robotics and Perception Group led by Prof. Davide Scaramuzza. Our lab was founded in February 2012 and is part of the Department of Informatics, at the University of Zurich, and the Department of Neuroinformatics, which is a joint institute of both the University of Zurich and ETH Zurich.

Our mission is to research the fundamental challenges of robotics and computer vision that will benefit all of humanity. Our key interest is to develop autonomous machines that can navigate all by themselves using only onboard cameras and computation, without relying on external infrastructure, such as GPS or position tracking systems, nor off-board computing. Our interests encompass predonimantly micro drones because they are more challenging and offer more research opportunities than ground robots.

News

July 23, 2025

Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe

Autonomous quadrotor flight in confined spaces such as pipes and tunnels presents significant challenges due to unsteady, self-induced aerodynamic disturbances. In this work, we present the first closed-loop control system for quadrotors for hovering in narrow pipes that leverages real-time flow field measurements. We develop a low-latency, event-based smoke velocimetry method that estimates local airflow at high temporal resolution and use this to improve the control performance. For more details and flow visualizations, check out our latest paper and video.

June 25, 2025

LiDAR Registration with Visual Foundation Models (RSS 2025 Paper)

In this work, we present a robust LiDAR registration method using DINOv2 features extracted from surround-view images as point descriptors. Unlike traditional handcrafted or learned descriptors, this approach effectively handles domain shifts, seasonal changes, and structural variations in point clouds. When combined with standard registration methods like RANSAC or ICP, it achieves accurate 6DoF alignment between LiDAR scans and 3D maps, even across long time gaps. The method is simple, does not require retraining, and works with both sparse and dense data. It significantly outperforms existing techniques, with up to +24.8 and +17.3 gains in registration recall on the NCLT and Oxford Radar RobotCar datasets, respectively. For more details, check out our paper, code, and video!

June 20, 2025

Multi-Aerial Robotic System for Power Line Inspection and Maintenance: Comparative Analysis from the AERIAL-CORE Final Experiments (T-FR 2025)

We are excited to share our new paper that summarizes the results achieved in the Aerialcore project. In Aerialcore, we have collaborated with top European academic and industrial partners to automate the inspection of large, critical infrastructures, such as power lines. Large-scale infrastructures are prone to deterioration due to age, environmental influences, and heavy usage. Ensuring their safety through regular inspections and maintenance is crucial to prevent incidents that can significantly affect public safety and the environment. This paper introduces the first autonomous system that combines various innovative aerial robots. This system is designed for extended-range inspections beyond the visual line of sight, features aerial manipulators for maintenance tasks, and includes support mechanisms for human operators working at elevated heights. For more details, check out our paper and video.

June 16, 2025

Code Release! Egocentric Event-Based Vision for Ping Pong Ball Trajectory Prediction (CVPRW 2025)

We are excited to release the code of our real-time egocentric trajectory prediction system for table tennis using event cameras. Unlike conventional cameras, event cameras offer high temporal resolution and reduced latency, enabling accurate ball trajectory predictions shortly after the opponent's hit. The system uses data from Meta Project Aria glasses, including 3D ball trajectories and eye-gaze information, to implement a foveated vision approach. By focusing only on the viewer's gaze region, the system improves detection performance and reduces computational load by a factor of 10.81. It achieves a worst-case latency of just 4.5 ms, significantly faster than traditional 30 FPS cameras. A trajectory prediction model is then applied to forecast the ball’s future 3D path. This is the first method to use event cameras for egocentric table tennis trajectory prediction. For more details, check out our paper!

June 6, 2025

Sight Guide: A Wearable Assistive Perception and Navigation System for the Vision Assistance Race in the Cybathlon 2024

This work presents Sight Guide, a wearable assistive system developed for the Vision Assistance Race (VIS) at Cybathlon 2024 to help visually impaired individuals navigate complex environments. The system uses RGB and depth cameras with an embedded computer to deliver guidance via vibrations and audio commands. Combining classical robotics and learning-based methods, it supports functions like obstacle avoidance, object detection, OCR, and touchscreen use. Sight Guide achieved a 95.7% task success rate in testing and proved effective during the competition. The paper details the system design, evaluation, and future development directions.

June 5, 2025

New PhD Student

We welcome Nicole Damblon as a new PhD student!

June 5, 2025

Reading in the Dark with Foveated Event Vision (CVPRW 2025 Paper)

In this work, we introduce a novel event-based Optical Character Recognition (OCR) system for smart glasses that overcomes the limitations of traditional RGB cameras in low-light and high-speed conditions. By leveraging event cameras and eye-gaze data to foveate the visual input, the system drastically reduces bandwidth usage, by approximately 98%, while maintaining performance in challenging environments. It uses deep binary reconstruction trained on synthetic data and integrates multimodal large language models (LLMs) for OCR, outperforming conventional methods. The system can read text in low-light settings where RGB cameras fail, using up to 2,400 times less bandwidth than standard wearable RGB cameras. For more details, check out our paper!

June 5, 2025

Egocentric Event-Based Vision for Ping Pong Ball Trajectory Prediction (CVPRW 2025 Paper)

In this work, we present a real-time egocentric trajectory prediction system for table tennis using event cameras. Unlike conventional cameras, event cameras offer high temporal resolution and reduced latency, enabling accurate ball trajectory predictions shortly after the opponent's hit. The system uses data from Meta Project Aria glasses, including 3D ball trajectories and eye-gaze information, to implement a foveated vision approach. By focusing only on the viewer's gaze region, the system improves detection performance and reduces computational load by a factor of 10.81. It achieves a worst-case latency of just 4.5 ms, significantly faster than traditional 30 FPS cameras. A trajectory prediction model is then applied to forecast the ball’s future 3D path. This is the first method to use event cameras for egocentric table tennis trajectory prediction. For more details, check out our paper and code!

May 16, 2025

Regularity and Stability Properties of Selective SSMs with Discontinuous Gating (ArXiv 2025 Paper)

Deep Selective State-Space Models (SSMs), characterized by input-dependent, time-varying parameters, offer significant expressive power but pose challenges for stability analysis, especially with discontinuous gating signals. In this paper, we investigate the stability and regularity properties of continuous-time selective SSMs through the lens of passivity and Input-to-State Stability (ISS). We establish that intrinsic energy dissipation guarantees exponential forgetting of past states. Crucially, we prove that the unforced system dynamics possess an underlying minimal quadratic energy function whose defining matrix exhibits robust AUC_loc regularity, accommodating discontinuous gating. Furthermore, assuming a universal quadratic storage function ensures passivity across all inputs, we derive parametric LMI conditions and kernel constraints that limit gating mechanisms, formalizing "irreversible forgetting" of recurrent models. Finally, we provide sufficient conditions for global ISS, linking uniform local dissipativity to overall system robustness. Our findings offer a rigorous framework for understanding and designing stable and reliable deep selective SSMs. For more details, check out our paper!

May 16, 2025

Maximizing Asynchronicity in Event-based Neural Networks (ArXiv 2025 Paper)

Addressing the challenge of asynchronous event camera data for ML, we introduce EVA (EVent Asynchronous representation learning). This novel asynchronous-to-synchronous (A2S) framework learns expressive, generalizable event-by-event representations. Inspired by language models (linear attention) and self-supervised learning, EVA surpasses prior A2S methods on recognition and achieves a first for A2S by mastering demanding detection tasks (47.7 mAP on Gen1 event-based object detection dataset). EVA significantly improves real-time event-based vision. For more details, check out our paper!

April 17, 2025

GG-SSMs: Graph-Generating State Space Models (CVPR 2025 Highlight Paper)

Graph-Generating State Space Models (GG-SSMs) take SSMs to the next level by dynamically building graphs based on feature relationships, meaning we no longer have fixed scanning paths as in works like VMamba and Vim. Powered by Chazelle's MST algorithm, they adapt to each dataset's structure, efficiently delivering robust feature propagation and capturing complex dependencies. Tested on 11 diverse datasets (including ImageNet, KITTI-15, time-series data, and event-based eye-tracking), GG-SSMs consistently outperform existing approaches. Highlights include 84.9% top-1 accuracy on ImageNet (+1% over prior SSMs), a 2.77% error rate on KITTI-15, and improved eye-tracking detection by up to 0.33% - all with fewer parameters.
For more details, check out our paper and code!

April 16, 2025

ForesightNav: Learning Scene Imagination for Efficient Exploration (CVPRW 2025 Paper)

In this work, we proposed ForesightNav, a novel exploration strategy inspired by human imagination and reasoning. Our approach equips robotic agents with the capability to predict contextual information, such as occupancy and semantic details, for unexplored regions. These predictions enable the robot to efficiently select meaningful long-term navigation goals, significantly enhancing exploration in unseen environments. We validate our imagination-based approach using the Structured3D dataset, demonstrating accurate occupancy prediction and superior performance in anticipating unseen scene geometry. Our experiments show that the imagination module improves exploration efficiency in unseen environments, achieving a 100% completion rate for PointNav and an SPL of 67% for ObjectNav on the Structured3D Validation split.
For more details, check out our paper and code!

April 16, 2025

Perturbed State Space Feature Encoders for Optical Flow with Event Cameras (CVPRW 2025 Paper)

Perturbed State Space Feature Encoders (P-SSE) introduce a novel approach to multi-frame optical flow estimation for event cameras, combining the broad receptive field of Transformer-inspired architectures with the efficient, linear scaling of state space models (SSMs). A key innovation lies in the perturbation of the state dynamics matrix, leading to improvements in both stability and performance. By incorporating bi-directional flow estimation and recurrent connections, P-SSE captures richer temporal context, with improvements in end-point error (EPE) of 8.48% on DSEC-Flow and 11.86% on MVSEC. These gains establish P-SSE as a new state-of-the-art in accuracy and efficiency for event-based optical flow.
For more details, check out our paper!

April 10, 2025

RPG at 'Wider than the Sky' premiere

RPG attended the premiere of the Swiss-Italian documentary movie Wider than the Sky by Valerio Jalongo at the 56th Visions du Reel film festival in Nyon. Our lab was deeply involved in the making of this documentary film about artificial intelligence as many scenes are filmed with event cameras. Our drone races against the world-champions are also prominently featured as an occasion where AI and humans competed in the physical world.

April 2, 2025

Tutorial on Event Cameras available

In this keynote held at the 2024 International Conference on Computational Photography, Davide Scaramuzza presents a visionary keynote on event cameras, which are bio-inspired vision sensors that outperform conventional cameras with ultra-low latency, high dynamic range, and minimal power consumption. He dives into the motivation behind event-based cameras, explains how these sensors work, and explores their mathematical modeling and processing frameworks. He highlights cutting-edge applications across computer vision, robotics, autonomous vehicles, virtual reality, and mobile devices while also addressing the open challenges and future directions shaping this exciting field. Link to the video recording.

April 1, 2025

HDVIO2.0: Wind and Disturbance Estimation with Hybrid Dynamics VIO

We present HDVIO2.0, an hybrid-dynamics Visual-Inertial Odometry (VIO) system that models full 6-DoF, translational and rotational, vehicle dynamics and tightly incorporates them into VIO. HDVIO2.0 proposes a hybrid dynamics quadrotor model that combines a point-mass vehicle model with a learning-based component, with access to control commands and IMU history, to capture complex aerodynamic effects. HDVIO2.0 leverages the divergence between the actual motion and the predicted motion from the hybrid dynamics model to estimate external forces as well as the robot state. Our system surpasses the performance of state-of-the-art methods in experiments using public and new drone dynamics datasets, as well as real-world flights in winds up to 25 km/h. For more details, please check out paper.

March 7, 2025

New PhD Student

We welcome Simone Nascivera as a new PhD student in our lab!

March 6, 2025

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory (ICLR 2025 Paper)

Is Deep Learning Dumb? Despite their remarkable successes, Deep Learning (DL) models such as Transformers and Structured State Space Models (SSMs) face substantial challenges with complex reasoning and function composition tasks. Our research uncovers fundamental theoretical limitations through four key theorems: (a) the inability of SSMs to compose functions, (b) exponential scaling of compute when performing chain-of-thought that still does not fully resolve the given problem, (c) an inability to solve NL-complete problems (unless L=NL), and (d) equivalence of current DL architectures to Finite State Machines meaning that the language of current finite-precision DL architectures is within the class of regular languages. These proofs highlight fundamental computational barriers in current DL architectures, emphasizing the urgent need for new approaches to enable reliable multi-step reasoning essential for the ultimate goal of solving really important, potentially NP-hard mathematical problems.

For more details, please read our paper.

February 27, 2025

Student-Informed Teacher Training (ICLR 2025 Spotlight Paper)

To address the teacher-student asymmetry, we propose a framework for joint training of the teacher and student policies, encouraging the teacher to learn behaviors that can be imitated by the student despite the latters limited access to information and its partial observability. We motivate our method with a maze navigation task and demonstrate its effectiveness on complex vision-based quadrotor flight and manipulation tasks. For more details, please check out paper, project page, and code.

February 27, 2025

Environment as Policy: Learning to Race in unseen Tracks

To enhance the generalizability of the RL agent, we propose an adaptive environment-shaping framework that dynamically adjusts the training environment based on the performance of the agents. We achieve this by leveraging a secondary RL policy to design environments that strike a balance between being challenging and achievable, allowing the agent to adapt and improve progressively. Using our adaptive environment shaping, one single racing policy efficiently learns to race in diverse challenging tracks. For more details, please check out our paper and project page.

February 24, 2025

New Research Assistant

We welcome Rong Zou as a new Research Assistant in our lab!

February 20, 2025

A Monocular Event-Camera Motion Capture System

Conventional motion-capture systems are challenging to use in confined spaces because they rely on multiple views to triangulate the pose of an object. In this technical report we describe a monocular event-camera motion capture system which overcomes this limitation and is ideally suited for narrow spaces. Instead of passive markers it relies on active, blinking LED markers such that each marker can be uniquely identified from the blinking frequency. We demonstrate that the system can be used for closed-loop control of a small drone. For more details, check out our technical report.

February 18, 2025

Dream to Fly: Model-Based Reinforcement Learning for Vision-Based Drone Flight

Can we use Model-based Reinforcement Learning (MBRL) to fly a drone from pixels to commands? In our new paper, we present an approach for training quadrotor navigation policies from scratch - mapping raw onboard camera pixels directly to control commands, much like a human pilot. While model-free methods such as PPO are sample-inefficient and struggle in this setting, we leverage MBRL to train visuomotor policies capable of agile flight through a racetrack using only raw pixel observations. Moreover, because our policies are trained end-to-end directly from pixels, we no longer require the perception-aware reward term used in previous methods. Instead, we show that this behavior naturally emerges, resulting in policies that guide the camera toward feature-rich areas of the observation space. For more details, check out our paper.

February 4, 2024

New Postdoc

We welcome Dr. Rudolf Reiter as a new Postdoc in our lab!

January 28, 2025

Data-driven Feature Tracking for Event Cameras with and without Frames (TPAMI 2025)

In our new TPAMI paper, we introduce a versatile feature tracker that can provide robust feature tracks and sparse disparity estimation while benefiting from the advantages of event cameras. Our tracker works solely with events or in a hybrid mode incorporating both events and frames. Moreover, our network architecture seamlessly extends to sparse disparity estimation in a stereo setup containing an event and a frame camera. For more details, check out our paper and code.

January 27, 2025

Training code release: Low-Latency Automotive Vision with Event Cameras

We are excited to release the training code of our recent Nature paper. We tackle a critical bandwidth-latency trade-off with a hybrid event + 20 Hz RGB camera sensor system, and a novel asynchronous learning method to deliver efficient, high-rate detections of traffic participants. This system achieves the same perceptual latency as a 5,000 Hz camera with the bandwidth of a 45 Hz camera without compromising accuracy, paving the way for efficient and robust perception in edge-case scenarios. Please check out the training code, video and paper for more information.

January 18, 2025

Multi-task Reinforcement Learning for Quadrotors

Our RAL 25 Paper introduces a multi-task reinforcement learning (MTRL) framework for quadrotor control that leverages shared dynamics and a multi-critic architecture to enable efficient knowledge transfer. Unlike single-task RL policies, our approach allows a single policy to perform diverse maneuvers, including stabilization, velocity tracking, and racing, while significantly improving sample efficiency and task performance in both simulation and real-world tests.
For more details, please check out our paper and video.

January 7, 2025

New Research Assistant

We welcome Daniel Zhai as a new Research Assistant in our lab!

January 6, 2025

FaVoR: Features via Voxel Rendering for Camera Relocalization (WACV 2025)

This work, done in collaboration with the STARS Laboratory at the University of Toronto, introduces a novel approach to camera relocalization, leveraging a globally sparse yet locally dense 3D representation of 2D features for robust pose estimation. FaVoR achieves up to 39% improvement in median translation error for indoor environments and performs efficiently in outdoor scenarios with lower memory and computational costs. For more details, please check the paper, project page, and code.

December 18, 2024

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control

World models predict future frames from past observations and actions, making them powerful simulators for ego-vision tasks with complex dynamics, such as autonomous driving, human ego-centric activities, and drones. Nonetheless, existing world models for ego-vision have predominantly focused on the driving domain and the ego-vehicle's actions, which limits the complexity and diversity of the generated scenes. In this collaborative project between EPFL, the University of Bern, the Swiss Data Science Center, and ETH, we contribute to the development of GEM, a diffusion-based world model with a generalized control strategy. By leveraging ego-trajectories and general image features, GEM facilitates fine-grained control over ego-motion, enables control over the motion of other objects in the scene, and supports scene composition by inserting new objects. GEM is multimodal, capable of generating both videos and future depth sequences, thus providing rich semantic and spatial output contexts. While the primary focus of GEM remains in the domain of autonomous driving, we explore the adaptability of GEM to other ego-vision domain such as human activity and drone navigation. For more details, please check out our paper, project page, and code.

December 12, 2024

Drift-free Visual SLAM using Digital Twins (RA-L 2024)

Our RAL24 paper introduces a novel approach that achieves accurate and globally-consistent localization by aligning the sparse 3D point cloud generated by the VIO/VSLAM system to a digital twin using point-to-plane matching; no visual data association is needed. Our method provides a 6-DoF global measurement tightly integrated into the VIO/VSLAM system. Experiments run on a high-fidelity GPS simulator and real-world data collected from a drone demonstrate that our approach outperforms state-of-the-art VIO-GPS systems and offers superior robustness against viewpoint changes compared to the state-of-the-art Visual SLAM systems. Check out our paper and code.

December 12, 2024

Robotics meets Fluid Dynamics: A Characterization of the Induced Airflow around a Quadrotor

Our paper presents a novel, computationally efficient model for estimating the induced flow beneath quadrotors during hover. By leveraging classical turbulent flow theory and analyzing extensive flight data, we demonstrate a unified model applicable across different drone sizes. This model, requiring minimal input parameters, offers a practical solution for safer operations during inspection, mapping, agricultural applications, and many more scenarios where the predicting the strong winds generated by the drone's propeller is critical. For more details, check out our latest IEEE RA-L paper and video.

December 5, 2024

RPG Collaborates with NASA JPL to Pioneer Advanced Mars Science Helicopter

In this News Article, we talk about our collaboration with NASA JPL for Mars helicopters: UZH news.

December 3, 2024

Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning

Is it possible to combine the benefits of model-free reinforcement learning (RL) - known for its strong task performance and flexibility in optimizing general reward formulations - with the robustness and online replanning capabilities of model predictive control (MPC)? This extension digs deeper into the answer by studying our new framework called Actor-Critic Model Predictive Control. We conduct a deep study that exposes the benefits of the proposed approach: it achieves better out-of-distribution behaviour, better robustness to changes in the dynamics and improved sample efficiency. Additionally, we conduct an empirical analysis that reveals a relationship between the critic's learned value function and the cost function of the differentiable MPC, providing a deeper understanding of the interplay between the critic's value and the MPC cost functions. Our method achieves the same superhuman performance as state-of-the-art model-free RL, showcasing speeds of up to 21 m/s. For more details, check out our latest extension paper and video.

November 18, 2024

Our group welcomed the Unitree G1 humanoid robot's first time in Switzerland

The humanoid robot G1, developed by Unitree, was recently presented for the first time in Switzerland at the University of Zurich. Our group is proud to be part of such cutting-edge advancements in robotics, working to address the challenges of perception, planning, and control to unlock the full potential of humanoid robots. A Blick article explores the obstacles robots face in understanding and acting with common sense in daily life. Meanwhile, the NZZ highlights how real-world demonstrations reveal gaps between marketed potential and current robotic capabilities.

November 7, 2024

Empowering the blind at Cybathlon 2024

Our research on vision assistance for blind people is in today's UZH News. Read this interview of our PhD student Giovanni Cioffi to learn how our technology for flying robots was used to design the navigation algorithm of Sight Guide at Cybathlon 2024! Check the interview and the full video of the performance at Cybathlon.

November 6, 2024

New Research Assistant

We welcome Roberto Pellerito as a new Research Assistant in our lab!

November 5, 2024

Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight (CoRL 2024)

In this work, we combine the effectiveness of Reinforcement Learning (RL) and the efficiency of Imitation Learning (IL) in the context of vision-based, autonomous drone racing. We combine the task performance of RL with the sample efficiency of IL. Our experiments in both simulated and real-world environments demonstrate that our approach achieves superior performance and robustness than IL or RL alone in navigating a quadrotor through a racing course using only visual information without explicit state estimation. For more details, check out our paper, video, and website.

November 4, 2024

Monocular Event-Based Vision for Obstacle Avoidance with a Quadrotor (CoRL 2024)

We introduce a framework for simulation pre-training of learned event-vision control policies, which we hope can enable fast, agile, vision-based robots. Using this method, we present the first demonstrations of static obstacle avoidance with a quadrotor using only a monocular event camera. By leveraging depth prediction as a pretext task, we pre-train a reactive obstacle avoidance events-to-control policy with approximated events across hundreds of simulated, privileged expert rollouts, and then fine-tune a perception network with limited events-and-depth real-world data to achieve obstacle avoidance in indoor, outdoor, dark, and forest settings. For more details, please check out our paper and video.

November 1, 2024

Sight Guide: Vision Assistance for Blind People - Cybathlon 2024

Our team, Sight Guide, competed in the Vision Assistance Race at the CYBATHLON 2024 - the "cyber Olympics" designed to push the boundaries of assistive technology. In this race, our system guided a blind participant through everyday tasks such as walking along a sidewalk, sorting colors, ordering from a touchscreen, purchasing a box of tea, and navigating a forest—all powered by computer vision assistance! We used two RGB cameras and a depth camera for localization and 3D mapping and semantic understanding, and a belt for haptic feedback. Our system received the Jury Award for the most innovative and user-friendly solution. Check out our project page.

October 15, 2024

Learning Quadruped Locomotion Using Differentiable Simulation

(CoRL 2024, Oral)

While most recent advancements in legged robot control have been driven by traditional model-free reinforcement learning, we explore the potential of differentiable simulation. Differentiable simulation promises faster convergence and more stable training by computing low-variant first-order gradients using the robot model, but so far, its use for legged robot control has remained limited to simulation. This work represents the first demonstration of using differentiable simulation for controlling a real quadruped robot. Conducted in collaboration with MIT, this work has been accepted for oral presentation at CoRL 2024. For more details, please check out our paper and video.

October 14, 2024

RPG is happy to announce our new IROS 2024 paper: IronCub!

Check out our joint IROS2024 paper with the group of Daniele Pucci at the Istituto Italiano di Tecnologia where the iRonCub (the flying humanoid) learns to walk and fly using adversarial motion priors! This work presents an approach that enables automatic smooth transitions between legged and aerial locomotion. Leveraging the concept of Adversarial Motion Priors, our method allows the robot to imitate motion datasets and accomplish the desired task without the need for complex reward functions. The robot learns walking patterns from human-like gaits and aerial locomotion patterns from motions obtained using trajectory optimization. Through this process, the robot adapts the locomotion scheme based on environmental feedback using reinforcement learning, with the spontaneous emergence of mode-switching behavior.
For more details, please check out our paper and video.

October 9, 2024

Davide Scaramuzza keynote speaker at IROS 2024

Davide Scaramuzza will deliver a keynote speech at IROS 2024 AbuDhabi on Tuesday October 15th at 17:00 hrs. He will be showing how our research on agile flight led to real world impact, from the leading AR/VR headset, Meta Quest, to sustainable agriculture at SUIND, to autonomous inspection of power lines, and wildfire prevention! More info here: Link

October 7, 2024

RPG is happy to announce a new general sequence model: S7!

A key challenge in sequence modeling is efficiently handling long-range dependencies, which recent state-space models (SSMs) address but often with increased complexity. The new model, S7, improves on previous SSMs like Mamba (S6) by introducing input-dependent filtering with stable reparameterization, enabling dynamic state transitions while maintaining simplicity. Two key proofs demonstrate that S7 ensures stability in long-sequence modeling and controls gradient norms, preventing exploding or vanishing gradients. S7 consistently outperforms previous models on benchmarks such as neuromorphic datasets, Long Range Arena, and various physical and biological time series, achieving superior results with fewer inductive biases and much simpler design.
For more details, please check out our paper.

October 1st, 2024

New Lab Engineer

We welcome Yannick Armati as a new Lab Engineer in our lab!

September 17, 2024

Davide Scaramuzza distinguished speaker at ICRA40

Davide Scaramuzza is one of the distinguished speakers selected for the 40th Anniversary of the IEEE Conference on Robotics and Automation: Link

September 10, 2024

Deep Visual Odometry with Events and Frames (IROS 2024)

Our IROS24 paper presents RAMP-VO, a novel end-to-end learnable visual odometry system tailored for challenging conditions. It seamlessly integrates event-based cameras with traditional frames, utilizing Recurrent, Asynchronous, and Massively Parallel (RAMP) encoders. Despite being trained only in simulations, it outperforms both learning-based and model-based methods, demonstrating its potential for robust space navigation. For more details, check out our paper, video and code.

September 6, 2024

Structure-Invariant Range-Visual-Inertial Odometry (IROS 2024)

Can drones autonomously land on Mars without the need of a lander? This paper introduces a novel range-visual-inertial odometry system tailored for the challenges of the Mars Science Helicopter (MSH) mission, which aims to deploy the next generation of unmanned helicopters on Mars in a mid-air fashion (after atmospheric entry and descent), using a jetpack to slow down the helicopter within its control envelope. Unlike the Mars 2020 mission, which relied on a state estimation system assuming planar terrain, MSH requires a novel approach due to the complex topography of the targeted landing sites, with elevation variances of up to 8000 meters. Our system extends the state-of-the-art xVIO framework by fusing consistent range information with visual and inertial measurements, preventing metric scale drift in the absence of visual-inertial excitation (mono camera and constant velocity descent), and enabling landing on any terrain structure, without requiring any planar terrain assumption. Check out our latest IROS 2024 paper and video.

August 23, 2024

New Science Robotics paper: Wearable robots with vision technology for people with disabilities

A new era of wearable robots is on the horizon, promising to transform the lives of people with disabilities. Cutting-edge research in Science Robotics reveals that by using computer vision, these robots will soon anticipate and adapt to users' intentions, going beyond basic mechanical functions. Imagine prosthetics that adjust their grip based on what you're holding or exoskeletons that seamlessly support you while climbing stairs. This innovation aims to create a world where disability no longer limits possibilities. Read more about it in this news coverage.

August 14, 2024

We are hiring!

Are you ready to take your research career to the next level? We are seeking highly motivated PhD students and postdoctoral researchers to join our lab and make a significant impact on real-world challenges! Our researchers have earned numerous prestigious awards, and our alumni have gone on to lead teams at top-tier companies, become professors, and establish successful startups. Our work has received global recognition, with coverage in The Guardian, The New York Times, Forbes, The Economist, and BBC, highlighting achievements such as outperforming world champion drone racing pilots and advancing high-speed navigation with event cameras. If you're passionate about advancing research, technology, and innovation, apply now to join a team that is shaping the future. Job descriptions and how to apply: https://rpg.ifi.uzh.ch/positions.html

July 23, 2024

Reinforcement Learning Meets Visual Odometry (ECCV 2024)

Our ECCV24 paper tackles the challenges of VO by reframing VO as a sequential decision-making task and applying RL to adapt the VO process dynamically. Our approach introduces a learned agent within the VO pipeline, to make decisions such as keyframe and grid-size selection. Experimental results using classical VO methods demonstrate improvements in accuracy and robustness while eliminating the need for time-intensive parameter tuning. Check out our paper and code.

July 22, 2024

RSS 2024 Outstanding Demo Paper Award

We are honored to receive the RSS 2024 Outstanding Demo Paper Award for our paper "Demonstrating Agile Flight from Pixels wihtout State Stimation". Big congrats to the team! Checkout our paper, narrated video and live demo at RSS.

July 15, 2024

Kexin Shi receives UZH Semester Award

Congratulations to our former student Kexin Shi for receiving the UZH Semester Award her outstanding Master Thesis work on "Extreme Parkour with Legged Robots". Checkout the paper, video and code.

July 12, 2024

Elia Kaufmann receives European PhD Award on Systems and Control

Congratulations to our former PhD student Elia Kaufmann for receiving the European PhD Award on Systems and Control 2023 for his contribution to Learning Vision-Based Agile Flight: From Simulation to the Real World!

July 1st, 2024

Congratulations to Yunlong Song for successfully defending his Ph.D.

Congratulations to Yunlong Song for successfully defending his Ph.D. in "Learning Robot Control: From Reinforcement Learning to Differentiable Simulation"! Many thanks to the external reviewers Yuke Zhu from the U. of Texas at Austin, Marco Hutter from ETH Zurich, and Martin Riedmiller from Google Deep Mind! Yunlong's contributions are:
- To show that Reinforcement Learning (RL) outperforms Optimal Control in autonomous racing because it directly optimizes a non-differentiable task-level objective.
- To propose a policy-search-for-model-predictive-control (MPC) framework, combining RL's ability to optimize high-level task objectives with MPC's precise actuation and constraint handling.
- To introduce a differentiable simulation framework to leverage robot dynamics for more stable and efficient policy training.
- To develop a high-performance drone racing system that outperforms optimal control methods and professional pilots.
- To develop Flightmare, a flexible modular quadrotor simulator for reinforcement learning and vision-based flight.

- Video Recording of the PhD defense
- Yunlong's webpage (publications, source code, slides)
- Google Scholar

Congratulations, Yunlong!

July 1st, 2024

New Research Assistant

We welcome Ivan Alberico as a new Research Assistant in our lab!

June 20, 2024

Demonstrating Agile Flight from Pixels without State Estimation

We present the first vision-based quadrotor system that autonomously navigates through a sequence of gates at high speeds while directly mapping pixels to control commands. Like professional drone-racing pilots, our system does not use explicit state estimation and leverages the same control commands humans use (collective thrust and body rates). We demonstrate agile flight at speeds up to 40km/h with accelerations up to 2g. This is achieved by training vision-based policies with reinforcement learning (RL). The training is facilitated using an asymmetric actor-critic with access to privileged information. To overcome the computational complexity during image-based RL training, we use the inner edges of the gates as a sensor abstraction. Our approach enables autonomous agile flight with standard, off-the-shelf hardware. Check out our latest RSS 2024 paper and video.

June 18, 2024

Elia Kaufmann receives UZH Annual Award for his PhD thesis

Congratulations to our former PhD student Elia Kaufmann for receiving the prestigious Ackeret Award by the Swiss Association of Aeronautical Sciences for his PhD thesis on human-level autonomous drone flight!

June 18, 2024

Davide Scaramuzza receives IEEE Kiyo Tomiyasu Award 2024

Professor Davide Scaramuzza was honored to receive the IEEE Kiyo Tomiyasu Technical Field Award during the ICRA2024 conference 'for contributions to agile visual navigation of micro drones and low-latency robust perception with event cameras!' This is the third time this prestigious IEEE career award has gone to a roboticist since it was established over 20 years ago!
In this interview at UZH, Davide Scaramuzza talks about what this award personally means to him and the contributions that led to it.
List of all past IEEE Kiyo Tomiyasu Award recipients: link
List of all 2024 IEEE Technical Field Award recipients: link

June 17, 2024

Special Issue on Visual SLAM in the IEEE Transactions on Robotics (T-RO)

We are thrilled to announce a Special Issue on Visual SLAM in the IEEE Transactions on Robotics (T-RO). Submission deadline: December 15, 2024. We invite contributions on latest advancements on topics related but not limited to: robust SLAM, representations in SLAM (NeRF, Gaussian Splatting, etc.), semantic, object-level, and dynamic SLAM, large scale SLAM, novel datasets and benchmarks, unconventional vision sensors for SLAM (Event cameras, Thermal cameras, etc.), SLAM for robotics navigation, multi-agents SLAM, visual(-Inertial) odometry, learning-based SLAM, beyond conventional SLAM (Large Language Models, Foundation Models, etc.). Check out the Special Issue website.

June 14, 2024

MPCC++: Model Predictive Contouring Control for Time-Optimal Flight with Safety Constraints

Is it possible to fly as fast as possible while keeping safety? This paper introduces three key components that enhance the MPCC approach for drone racing. First, we provide safety guarantees in the form of a constraint and tunnel-shaped terminal set, which prevents gate collisions. Second, we augment the dynamics with a residual term that captures complex aerodynamic effects and thrust forces learned directly from real world data. Third, we use Trust Region Bayesian Optimization (TuRBO) to tune the hyperparameters of the MPC controller given a sparse reward based on lap time minimization. The proposed approach achieves similar lap times to the best state-of-the-art RL while satisfying constraints, achieving 100% success rate in simulation and real-world. Check out our latest RSS 2024 paper and video.

May 29, 2024

Low Latency Automotive Vision with Event Cameras

Check out our new paper published in Nature which addresses a critical bandwidth-latency trade-off observed in automotive vision systems based on image-based cameras. We tackle this challenge with a hybrid event + 20 Hz RGB camera sensor system, and a novel asynchronous learning method to deliver efficient, high-rate detections of traffic participants. This system achieves the same perceptual latency as a 5,000 Hz camera with the bandwidth of a 45 Hz camera without compromising accuracy, paving the way for efficient and robust perception in edge-case scenarios. For more information, read our paper and check out our code, video and dataset.

May 16, 2024

Survey on Autonomous Drone Racing

We present our survey on Autonomous Drone Racing, published at the IEEE Transactions on Robotics (T-RO). The survey covers the latest developments in agile flight for both model based and learning based approaches. We include extensive coverage of drone racing competitions, simulators, open source software, and the state of the art approaches for flying autonomous drones at their limits! For more information, see our paper

May 6, 2024

New Research Assistant

We welcome Maria Krinner as a new Research Assistant in our lab!

May 2, 2024

RSS24 Robust State Estimation Competition

Do you work on state estimation, Visual SLAM, or VIO? Are your algorithms robust enough for agile robotics? Take part in the RSS2024 Robust State Estimation Competition! To enter the competition, submit your trajectory results on the test sequences of the UZH FPV dataset which contains visual and inertial data recorded onboard a quadrotor flying at high speeds. This data was used to train our autonomous system that defeated the world-champion drone-racing pilots in several head-to-head races last year. The competition winner will be invited to give a talk at the workshop Towards Safe Autonomy at RSS 2024. Submission deadline: June 23rd, 2024. For more info: dataset page, competition page, and RSS workshop page.

April 27, 2024

Daniel Gehrig receives UZH Annual Award for his PhD thesis

Congratulations to our former PhD student Daniel Gehrig for receiving the prestigious UZH Annual Award for his PhD thesis "Efficient, Data-Driven Perception with Event Cameras". The thesis explores the potential of event cameras for quickly and reliably detecting objects such as traffic participants. To this end, he designed novel algorithms that are specifically tailored to these cameras and that learn based on examples. In the future, drivers can be supported and their safety increased, especially in dangerous situations.

April 22, 2024

A Hybrid ANN-SNN Architecture for Low-Power and Low-Latency Visual Perception

This work proposes a hybrid model combining Spiking Neural Networks (SNN) and classical Artificial Neural Networks (ANN) to optimize power efficiency and latency in edge devices. The hybrid ANN-SNN model overcomes state transients and state decay issues while maintaining high temporal resolution, low latency, and low power consumption. In the context of 2D and 3D human pose estimation, the method achieves an 88% reduction in power consumption with only a 4% decrease in performance compared to fully ANN counterparts, and a 74% lower error compared to SNNs. For more information, have a look at our paper and code.

April 17, 2024

Event Cameras Meet SPADs for High-Speed, Low-Bandwidth Imaging

This paper proposes a novel sensor fusion combining Single Photon Avalance Diodes (SPADs) and event cameras for high-speed low-light imaging. SPAD sensors are celebrated for their single-photon sensitivity, while event cameras offer the ability to measure brightness changes at rates up to 1 MHz with low bandwidth requirements. The fusion of these technologies allows for the improved reconstruction of dynamic scenes in poor lighting conditions, achieving improvements of more than 5 dB PSNR in image quality at a high temporal resolution of 100 kHz. This is joint collaboration between RPG, Advanced Quantum Architecture lab at EPFL led by Prof. Edoardo Charbon and Camera Culture Lab at MIT led by Prof. Ramesh Raskar. For more details, check out our paper.

April 10, 2024

State Space Models for Event Cameras (CVPR 2024 Spotlight Paper)

Today, state-of-the-art deep neural networks that process event-camera data first convert a temporal window of events into dense, grid-like input representations. As such, they exhibit poor generalizability when deployed at higher inference frequencies (i.e., smaller temporal windows) than the ones they were trained on. We address this challenge by introducing state-space models (SSMs) to event-based vision. This design adapts to varying frequencies without the need to retrain the network at different frequencies. We comprehensively evaluate our approach against existing methods based on RNN and Transformer architectures across various benchmarks, including Gen1 and 1 Mpx event camera datasets. Our results demonstrate that SSM-based models train 33% faster and also exhibit minimal performance degradation when tested at higher frequencies than the training input. Have a look at our paper, code and video.

April 9, 2024

An N-Point Linear Solver for Line and Motion Estimation with Event Cameras (CVPR 2024 Oral Paper)

Our CVPR24 Oral paper presents a novel linear solver for event cameras, leveraging a suitable line parametrization to efficiently recover partial linear velocity and line parameters. Unlike existing solvers, it offers speed and numerical stability without costly root finding, handling both minimal and overdetermined systems with more than 5 events. Additionally, it introduces a velocity averaging scheme for full linear camera velocity recovery, outperforming previous methods in both stability and speed by over 600 times in extensive experiments. This is joint collaboration between RPG and the Mobile Perception Lab at ShanghaiTech led by Prof. Laurent Kneip. For more details, check out our paper.

April 2, 2024

New Postdoc and new Research Assistant

We welcome Dr. Elie Aljalbout as a new Postdoc and Johannes Heeg as a new Research Assistant in our lab!

March 28, 2024

Mitigating Motion Blur in Neural Radiance Fields with Events and Frames (CVPR 2024)

Our CVPR24 paper exploits event-based cameras to tackle the problem of reconstructing a sharp radiance field from a set of blurry images. We exploit both model-based and learning-based components. We explicitly model the blur formation process using the event double integral as an additional model-based prior. Moreover, we model the event-pixel response using an end-to-end learnable response function that enables the NeRF to diverge from the model-based solution whenever inaccurate, resulting in higher-quality reconstructions. For more details, check out our video, paper and code.

March 22, 2024

Learning Quadruped Locomotion Using Differentiable Simulation

March 18, 2024

Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight

March 12, 2024

Davide Scaramuzza plenary speaker at ERF 2024

At the European Robotics Forum 2024, Davide Scaramuzza will share how model-based and machine learning methods, united with new, low-latency sensors are laying the foundation for better productivity and safety of future autonomous aircrafts. Click for more information: https://erf2024.eu/.

March 11, 2024

Davide Scaramuzza keynote speaker at NVIDIA GTC 2024

What will it take to fly autonomous drones as agile as human pilots? At GTC 2024, Davide Scaramuzza will share how model-based and machine learning methods, united with new, low-latency sensors are laying the foundation for better productivity and safety of future autonomous aircrafts. Click here for more information.

February 29, 2024

Contrastive Learning for Enhancing Robot Scene Transfer in Vision-based Agile Flight

We introduce an approach to learning an environment-agnostic representation to enhance the robustness of scene transfer for end-to-end vision-based agile flight. We propose an adaptive contrastive loss that dynamically adjusts the contrastive loss weight during training. The learned task-related embedding is similar across different environments and can be used to transfer the policy to unseen environments. Subsequently, we use the learned embedding to train a sensorimotor policy that takes only images and IMU as inputs and directly outputs the control commands. Our vision encoder training strategy outperforms several state-of-the-art methods in terms of task performance Check out our ICRA 2024 paper.

February 29, 2024

Actor-Critic Model Predictive Control

How can we combine the task performance and reward flexibility of model-free RL with the robustness and online replanning capabilities of MPC? We provide an answer by introducing a new framework called Actor-Critic Model Predictive Control (ACMPC). The key idea is to embed a differentiable MPC within an actor-critic RL framework. For more details, check out our latest ICRA 2024 paper, our video and our ICRA 2024 talk.

February 26, 2024

Contrastive Initial State Buffer for Reinforcement Learning (ICRA 2024)

Our ICRA24 paper introduces a Contrastive Initial State Buffer, which strategically selects states from past experiences and uses them to initialize the agent in the environment in order to guide it toward more informative states. The experiments on drone racing and legged locomotion show that our method achieves higher task performance while also speeding up training convergence. Check out our paper and code.

February 12, 2024

The UZH-FPV Dataset: New standing leaderboard

We are excited to announce that we have a new standing leaderboard for our UZH FPV Drone Racing Dataset. To enter the new leader board, participants will need to submit the results of their state estimators on the new test dataset. The new test dataset includes sequences containing visual and inertial data recorded onboard a quadrotor flying at speeds up to 100 km/h. This data was used to train our autonomous system that defeated the world champion drone racing pilots in several head-to-head races (Paper). The data is available for download here here. The new standing leaderboard is here. We are looking forward to your new participation to the UZH-FPV standing leaderboard!

February 11, 2024

Dense Continuous-Time Flow from Events and Frames (TPAMI 2024)

Our new TPAMI paper introduces a data-driven method that can estimate per-pixel continuous-time trajectories from event data and frames. If you have wondered how these two modalities can be effectively combined for motion estimation, have a look at our paper and code.

January 15, 2024

Our AI drone SWIFT in the Top 10 UZH News 2023

Our research on the first AI powered drone to defeat a human pilot made it to the top 10 UZH News 2023! Congratulations to the entire team!

January 12, 2024

AERIAL-CORE: AI-Powered Aerial Robots for Inspection and Maintenance of Electrical Power Infrastructures

January 03, 2024

Our Nature Paper in the Top Robotics Stories 2023

Our Nature paper is featured in the Top Robotics Stories of 2023 by IEEE Spectrum.

January 08, 2024

Seeing behind dynamic occlusion with event cameras

Unwanted camera occlusions, such as debris, dust, rain-drops, and snow, can severely degrade the performance of computer-vision systems. Dynamic occlusions are particularly challenging because of the continuously changing pattern. Our solution relies for the first time on the combination of a traditional camera with an event camera. When an occlusion moves across a background image, it causes intensity changes that trigger events. These events provide additional information on the relative intensity changes between foreground and background at a high temporal resolution, enabling a truer reconstruction of the background content. For more details, check out our paper.

December 13, 2023

Our startup SUIND raises 600k USD in seed round

Our startup SUIND raises 600k USD in a seed round led by Sunicon and others: Link

December 11, 2023

Revisiting Token Pruning for Object Detection and Instance Segmentation (WACV 2024)

Our latest paper on efficient Vision Transformers, accepted to WACV 2024, presents a novel token pruning method for object detection and instance segmentation, significantly accelerating inference while minimizing performance loss. Utilizing a lightweight MLP for dynamic token pruning, the method achieves up to 34% faster inference and reduces the performance drop to approximately 0.3 mAP, outperforming previous approaches on the COCO dataset. For more information, have a look at our paper, code and video.

December 5, 2023

Our lab is featured on Italian TV La7

Davide Scaramuzza talks about the research of our lab on Italian TV. First our autonomous drone racing is explained and subsequently the application of our research to search and resuce missions is presented. Watch the full video here (we are featured from 03:20 to 06:35).

November 14, 2023

Watch Davide Scaramuzza's talk at ETH Zurich

In this talk, hosted on November 2 by the Swiss Association of Aeronautical Sciences at ETH Zurich, Davide Scaramuzza presents an overview of our latest research aimed at achieving human-level performance with autonomous vision-based drones. He shows how integrating deep learning techniques with fast-response sensors, such as event cameras, enables drones to attain remarkable levels of speed and resilience, relying exclusively on onboard computation. Finally, he talks about the evolution of event cameras in academia and industry and their potential for robotics to enable low-latency, high bandwidth control.

November 12, 2023

Former Postdoc Reza Sabzevari appointed Professor at TU Delft

Our former Postdoc Reza Sabzevari has been appointed professor of Perception and Robotics at the TU Delft Aerospace Engineering Faculty.

November 1, 2023

New Research Assistant

We welcome Christian Sprecher as a new Research Assistant in our lab!

October 23, 2023

We are hiring!

We have multiple openings for Phd students and Postdocs. Job descriptions and how to apply: https://rpg.ifi.uzh.ch/positions.html

October 18, 2023

inControl Podcast features Davide Scaramuzza

Davide Scaramuzza is featured in the inControl podcast. He talks about magic, autonomous vision-based navigation, agile drone racing, and event-based cameras. The podcast is available on all common platforms, including the inControl website, Spotify, Apple Podcasts, Google Podcasts and Youtube.

October 16, 2023

IEEE Spectrum interviews Davide Scaramuzza and Adam Bry

In the recent IEEE Robotics Podcast, Evan Ackerman hosts Davide Scaramuzza and Adam Bry (CEO of Skydio) to discuss autonomous drones. They delve into how autonomy and computer vision enables super-human skills and how future drones could evolve.

October 3, 2023

Our work won the IEEE/RSJ IROS Best Paper Award

We are honored that our IEEE/RSJ IROS paper "Autonomous Power Line Inspection with Drones via Perception-Aware MPC" was selected for the Best Paper Award. Congratulations to all collaborators!

PDF Video Code

October 3, 2023

ICCV23 Oral Paper: A 5-Point Minimal Solver for Event Camera Relative Motion Estimation

We propose a novel, space-time manifold parametrization to constrain events generated by a line observed under locally constant speed. This first-of-its-kind minimal solver decodes the motion-geometry unknowns and yields a remarkable 100% success rate in linear velocity estimation on open-source datasets, surpassing existing methods. This is joint collaboration between RPG and the Mobile Perception Lab at ShanghaiTech led by Prof. Laurent Kneip. For more details and materials, check out the video, poster, and project page.

September 28, 2023

Our work selected as an IROS paper award candidate

Congratulations to Jiaxu and Giovanni whose IROS paper "Autonomous Power Line Inspection with Drones via Perception-Aware MPC" is nominated for either the conference best paper or best student paper award! Only 12 papers have been nominated out of 1,096 accepted papers: 1% nomination rate!

September 25, 2023

Our work won the Best Paper Award at IROS23 Workshop Robotic Perception and Mapping

We are happy to announce that our work "HDVIO: Improving Localization and Disturbance Estimation with Hybrid Dynamics VIO" won the best paper award at IROS23 Workshop Robotic Perception and Mapping: Frontier Vision and Learning Techniques. The paper will be presented in a spotlight talk on Thursday October 5th in Detroit. Congratulations to all collaborators! Check it out paper and video.

September 25, 2023

Our work in collaboration with ASL, ETH Zurich, won the Best Paper Award at IROS23 Workshop Robotic Perception and Mapping

We are happy to announce that our work in collaboration with ASL, ETH Zurich, "Attending Multiple Visual Tasks for Own Failure Detection" won the best paper award at IROS23 Workshop Robotic Perception and Mapping: Frontier Vision and Learning Techniques. The paper will be presented in a spotlight talk on Thursday October 5th in Detroit. Congratulations to all collaborators! Check it out the paper.

September 19, 2023

Code Release: Active Camera Exposure Control

We release the code of our camera controller that adjusts the exposure time and gain of the camera automatically. We propose an active exposure control method to improve the robustness of visual odometry in HDR (high dynamic range) environments. Our method evaluates the proper exposure time by maximizing a robust gradient-based image quality metric. Check out our paper for more details.

September 13, 2023

Reinforcement Learning vs. Optimal Control for Drone Racing

Reinforcement Learning (RL) vs. Optimal Control (OC) - why can RL achieve immpressive results beyond optimal control for many real-world robotic tasks? We investigate this question in our paper "Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning" published today in Science Robotics, available open access here!
Many works have focused on impressive results, but less attention has been paid to the systematic study of fundamental factors that have led to the success of reinforcement learning or have limited optimal control. Our results indicate that RL does not outperform OC because RL optimizes its objective better. Rather, RL outperforms OC because it optimizes a better objective: RL can directly optimize a task-level objective and can leverage domain randomization allowing the discovery of more robust control responses.
Check out our video to see our drone race autonomously with accelerations up to 12g!

September 1, 2023

AI Drone beats Human World Champions Head-to-Head Drone Race

We are thrilled to share our groundbreaking research paper published in Nature titled "Champion-Level Drone Racing using Deep Reinforcement Learning," available open access here!
We introduce "Swift," the first autonomous vision-based drone that won several fair head-to-head races against human world champions! The Swift AI drone combines deep reinforcement learning in simulation with data collected in the physical world. This marks the first time that an autonomous mobile robot has beaten human champions in a real physical sport designed for and by humans. As such it represents a milestone for mobile robotics, machine intelligence, and beyond, which may inspire the deployment of hybrid learning-based solutions in other physical systems, such as autonomous vehicles, aircraft, and personal robots, across a broad range of applications.
Curious to see "Swift" racing and know more? Check out these two videos from us and from Nature.

September 1, 2023

New PhD Student

We welcome Ismail Geles as a new PhD student in our lab!

August 30, 2023

From Chaos Comes Order: Ordering Event Representations for Object Recognition and Detection

State-of-the-art event-based deep learning methods typically convert raw events into dense input representations before they can be processed by standard networks. However, selecting this representation is very expensive, since it requires training a separate neural network for each representation and comparing the validation scores. In this work, we circumvent this bottleneck by measuring the quality of event representations with the Gromov-Wasserstein Discrepancy, which is 200 times faster to compute. This work opens a new unexplored field of explicit representation optimization. For more information, have a look at our paper. The code will be available on this link at the start of the ICCV 2023 conference.

August 25, 2023

IROS2023 Workshop: Learning Robot Super Autonomy

Do not miss our IROS2023 Workshop: Learning Robot Super Autonomy! The workshop features an incredible speakers lineup and we will have a best paper award with prize money. Checkout the agenda and join the presentations at our workshop website. Organized by Giuseppe Loianno and Davide Scaramuzza.

August 15, 2023

Scientifica - come and see our drones!

Our lab will open the doors of its large drone testing arena on August 30th, 14:00h. Bring your family and friends to learn more about drones and watch an autonomous drone race. If you are interested, please register here!

August 14, 2023

New Senior Scientist

We welcome Harmish Khambhaita as our new Senior Scientist. He obtained his Ph.D. in Toulouse and previously worked, among others, for Anybotics as the Autonomy and Perception Lead.

July 28, 2023

Learning Deep Sensorimotor Policies for Vision-based Autonomous Drone Racing

We tackle the vision-based autonomous-drone-racing problem by learning deep sensorimotor policies. We use contrastive learning to extract robust feature representations from the input images and leverage a learning-by-cheating framework for training a neural network policy. For more information, check out our IROS23 paper and video.

July 28, 2023

Our Science Robotics 2021 paper wins prestigious Chinese award!

We are truly honored to receive the prestigious Frontiers of Science Award in the category Robotics Science and Systems, which was presented on July 16th 2023 at the International Congress of Basic Science in the Beijing's People's Hall of China for our Science Robotics 2021's paper "Learning High Speed Flight in the Wild"! Congratulations to the entire team: Antonio Loquercio, Elia Kaufmann Rene Ranftl, Matthias Mueller, Vladlen Koltun. Many thanks to the award committee! Congratulations to other winners too. Paper, open-source code, and video.

July 04, 2023

Our paper on Authorship Attribution through Deep Learning accepted at PLOS ONE

We are excited to announce that our paper on authorship attribution for research papers has just been published in PLOS ONE. We developed a transformer-based AI that achieves over 70% accuracy on the newly created, largest-to-date, authorship-attribution dataset with over 2000 authors. For more information check out our PDF and open-source code.

July 03, 2023

Video Recordings of the 4th International Workshop on Event-Based Vision at CVPR 2023 available!

The recordings of the 4th international workshop on event-based vision at CVPR 2023 are available here. The event was co-organized by Guillermo Gallego, Davide Scaramuzza, Kostas Daniilidis, Cornelia Femueller, Davide Migliore.

June 21, 2023

Microgravity induces overconfidence in perceptual decision-making

We are excited to present our paper on the effects of microgravity on perceptual decision-making published in Nature Scientific Reports.

PDF YouTube Dataset

June 20, 2023

HDVIO: Improving Localization and Disturbance Estimation with Hybrid Dynamics VIO

We are excited to present our new RSS paper on state and disturbance estimation for flying vehicles. We propose a hybrid dynamics model that combines a point-mass vehicle model with a learning-based component that captures complex aerodynamic effects. We include our hybrid dynamics model in an optimization-based VIO system that estimates external disturbance acting on the robot as well as the robot's state. HDVIO improves the motion and external force estimation compared to the state-of-the-art. For more information, check out our paper and video.

June 13, 2023

Our CVPR Paper is Featured in Computer Vision News

Our CVPR highlight and award-candidate work "Data-driven Feature Tracking for Event Cameras" is featured on Computer Vision News. Find out more and read the complete interview with the authors Nico Messikommer, Mathias Gehrig and Carter Fang here!

Jun 13, 2023

DSEC-Detection Dataset Release

We release a new dataset for event- and frame-based object detection, DSEC-Detection based on the DSEC dataset, with aligned frames, events and object tracks. For more details visit the dataset website.

PDF YouTube Dataset Code

June 08, 2023

Our PhD student Manasi Muglikar is awarded UZH Candoc Grant

Manasi, PhD student in our lab, is awarded the UZH Candoc Grant 2023 for her outstanding research! Congratulations! Checkout her latest work on event-based vision here.

May 13, 2023

Training Efficient Controllers via Analytic Policy Gradient

In systems with limited compute, such as aerial vehicles, an accurate controller that is efficient at execution time is imperative. We propose an Analytic Policy Gradient (APG) method to tackle this problem. APG exploits the availability of differentiable simulators by training a controller offline with gradient descent on the tracking error. Our proposed method outperforms both model-based and model-free RL methods in terms of tracking error. Concurrently, it achieves similar performance to MPC while requiring more than an order of magnitude less computation time. Our work provides insights into the potential of APG as a promising control method for robotics.

PDF YouTube Code

May 10, 2023

We are hiring

We have multiple openings for a Scientific Research Manager, Phd students and Postdocs in Reinforcement Learning for Agile Vision-based Navigation and Computer vision with Standard Cameras and Event Cameras. Job descriptions and how to apply: https://rpg.ifi.uzh.ch/positions.html

May 09, 2023

NCCR Robotics Documentary

Check out this amazing 45-minute documentary on YouTube about the story of twelve years of groundbreaking robotics research by the Swiss National Competence Center of Research in Robotics (NCCR Robotics). The documentary summarizes all the key achievements, from assistive technologies that allowed patients with completely paralyzed legs to walk again to legged and flying robots with self-learning capabilities for disaster mitigation to educational robots used by thousands of children worldwide! Congrats to all NCCR Robotics members who have made this possible! And congratulations to the coordinator, Dario Floreano, and his management team! We are very proud to have been part of this! NCCR Robotics will continue to operate in four different projects. Check out this article to learn more.

May 04, 2023

Code Release: Tightly coupling global position measurements in VIO

We are excited to release fully open-source our code to tightly fuse global positional measurements in visual-inertial odometry (VIO)! Our code integrates global positional measurements, for example GPS, in SVO Pro, a sliding-window optimization-based VIO that uses the SVO frontend. We leverage the IMU preintegration theory to efficiently include the global position measurements in the VIO problem formulation. Our system outperforms the loosely-coupled approach in terms of absolute trajectory error up to 50% with negligible increase of the computational cost. For more information, have a look at our paper and code.

May 03, 2023

We win the ICRA Agile Movements Workshop Poster Award

Congratulations to Yunlong Song for winning the ICRA "Agile Movements: Animal Behaviour, Biomechanics, and Robot Devices" workshop poster award with his work "Fly fast with Reinforcement Learning".

April 25, 2023

Our work was selected as a CVPR Award Candidate

We are honored that our 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) paper "Data-driven Feature Tracking for Event Cameras" was selected as an award candidate. Congratulations to all collaborators!

PDF YouTube Code

April 17, 2023

Neuromorphic Optical Flow and Real-time Implementation with Event Cameras (CVPRW 2023)

We present a new spiking neural network (SNN) architecture that significantly improves optical flow prediction accuracy while reducing complexity, making it ideal for real-time applications in edge devices and robots. By leveraging event-based vision and SNNs, our solution achieves high-speed optical flow prediction with nearly two orders of magnitude less complexity, without compromising accuracy. This breakthrough paves the way for efficient real-time deployments in various computer vision pipelines. For more information, have a look at our paper.

April 13, 2023

Our Master student Asude Aydin wins the UZH Award for her Master Thesis

Asude Aydin, who did his Master thesis A Hybrid ANN-SNN Architecture for Low-Power and Low-Latency Visual Perception at RPG has received the UZH Award 2023 for her outstanding work. Check out her paper here, which is based on her Master thesis.

April 11, 2023

Event-based Shape from Polarization

We introduce a novel shape-from-polarization technique using an event camera (accepted at CVPR 2023). Our setup consists of a linear polarizer rotating at high-speeds in front of an event camera. Our method uses the continuous event stream caused by the rotation to reconstruct relative intensities at multiple polarizer angles. Experiments demonstrate that our method outperforms physics-based baselines using frames, reducing the MAE by 25% in synthetic and real-world dataset. For more information, have a look at our paper.

April 07, 2023

Recurrent Vision Transformers for Object Detection with Event Cameras (CVPR 2023)

We introduce a novel efficient and highly-performant object detection backbone for event-based vision. Through extensive architecture study, we find that vision transformers can be combined with recurrent neural networks to effectively extract spatio-temporal features for object detection. Our proposed architecture can be trained from scratch on publicly available real-world data to reach state-of-the-art performance while lowering inference time compared to prior work by up to 6 times. For more information, have a look at our paper and code.

April 3, 2023

Data-driven Feature Tracking for Event Cameras

We are excited to announce that our paper on Data-driven Feature Tracking for Event Cameras was accepted at CVPR 2023. In this work, we introduce the first data-driven feature tracker for event cameras, which leverages low-latency events to track features detected in a grayscale frame. Our data-driven tracker outperforms existing approaches in relative feature age by up to 130 % while also achieving the lowest latency
For more information, check out our paper, video and code.

April 3, 2023

Autonomous Power Line Inspection with Drones via Perception-Aware MPC

We are excited to present our new work on autonomous power line inspection with drones using perception-aware model predictive control (MPC). We propose a MPC that tightly couples perception and action. Our controller generates commands that maximize the visibility of the power lines while, at the same time, safely avoiding the power masts. For power line detection, we propose a lightweight learning-based detector that is trained only on synthetic data and is able to transfer zero-shot to real-world power line images. For more information, check out our paper and video.

April 3, 2023

RPG and LINA Project featured in RSI

In the recent news broadcast by RSI, our lab is featured for its efforts in developing and boosting research on civil applications for drones. The LINA project at the Dübendorf airport is making its infrastructure availble to researchers and industries to facilitate the testing and developing of autonomous flying systems hardware and software. RSI [IT]

April 1, 2023

New PhD Student

We welcome Nikola Zubić as a new PhD student in our lab!

March 30, 2023

Event-based Agile Object Catching with a Quadrupedal Robot

Learned Inertial Odometry for Autonomous Drone Racing

This work the low-latency advantages of event cameras for agil object catching with a quadrupedal robot. We use the event camera to estimate the trajectory of the object, which is then caught using an RL-trained policy. Our robot catches objects at up to 15 m/s with a 83% success rate. For more information, have a look at our ICRA 2023 paper, video and open-source code.

March 10, 2023

HILTI-SLAM Challenge 2023

RPG and HILTI are organizing the ICRA2023 HILTI SLAM Challenge! Instructions here. The HILTI SLAM Challenge dataset is a real-life, multi-sensor dataset with accurate ground truth to advance the state of the art in highly accurate state estimation in challenging environments. Participants will be ranked by the completeness of their trajectories and by the achieved accuracy. HILTI is a multinational company that offers premium products and services for professionals on construction sites around the globe. Behind this vast catalog is a global team comprising of 30.000 team members from 133 different nationalities located in more than 120 countries.

March 09, 2023

LINA Testing Facility at Dübendorf Airport

UZH Magazin releases a news article about our research on autonomous drones and our new testing facility at Dübendorf Airport that enables researchers to develop autonomous systems such as drones and ground-based robots from idea to marketable product. Read the article in English or in German. More information about the LINA project can be found here.

March 7, 2023

Our Master student Fang Nan wins ETH Medal for Best Master Thesis

Fang Nan, who did his Master thesis Nonlinear MPC for Quadrotor Fault-Tolerant Control at RPG has received the ETH Medal 2023 and the Willi Studer Prize for his outstanding work. Check out his RAL 2022 paper here, which is based on his Master thesis.

March 2, 2023

Learning Perception-Aware Agile Flight in Cluttered Environments

We propose a method to learn neural network policies that achieve perception-aware, minimum-time flight in cluttered environments. Our method combines imitation learning and reinforcement learning by leveraging a privileged learning-by-cheating framework. For more information, check out our ICRA23 paper or this video.

March 2, 2023

Weighted Maximum Likelihood for Controller Tuning

We present our new ICRA23 paper that leverages a probabilistic Policy Search method, Weighted Maximum Likelihood (WML), to automatically learn the optimal objective for MPCC. The data efficiency provided by the use of a model-based approach in the loop allows us to directly train in a high-fidelity simulator, which in turn makes our approach able to transfer zero-shot to the real world. For more information, check out our ICRA23 paper and video.

March 2, 2023

User-Conditioned Neural Control Policies for Mobile Robotics

We present our new paper that leverages a feature-wise linear modulation layer to condition neural control policies for mobile robotics. We demonstrate in simulation and in real-world experiments that a single control policy can achieve close to time-optimal flight performance across the entire performance envelope of the robot, reaching up to 60 km/h and 4.5 g in acceleration. The ability to guide a learned controller during task execution has implications beyond agile quadrotor flight, as conditioning the control policy on human intent helps safely bringing learning based systems out of the well-defined laboratory environment into the wild.
For more information, check out our ICRA23 paper and video.

February 28, 2023

Learned Inertial Odometry for Autonomous Drone Racing

We are excited to present our new RA-L paper on state estimation for autonomous drone racing. We propose a learning-based odometry algorithm that uses an inertial measurement unit (IMU) as the only sensor modality for autonomous drone racing tasks. The core idea of our system is to couple a model-based filter, driven by the inertial measurements, with a learning-based module that has access to the control commands. For more information, check out our paper, video, and code.

Feburary 15, 2023

Agilicious: Open-Source and Open-Hardware Agile Quadrotor for Vision-Based Flight

We are excited to present Agilicious, a co-designed hardware and software framework tailored to autonomous, agile quadrotor flight. It is completely open-source and open-hardware and supports both model-based and neural-network-based controllers. Also, it provides high thrust-to-weight and torque-to-inertia ratios for agility, onboard vision sensors, GPU-accelerated compute hardware for real-time perception and neural-network inference, a real-time flight controller, and a versatile software stack. In contrast to existing frameworks, Agilicious offers a unique combination of flexible software stack and high-performance hardware. We compare Agilicious with prior works and demonstrate it on different agile tasks, using both modelbased and neural-network-based controllers. Our demonstrators include trajectory tracking at up to 5 g and 70 km/h in a motion-capture system, and vision-based acrobatic flight and obstacle avoidance in both structured and unstructured environments using solely onboard perception. Finally, we demonstrate its use for hardware-in-the-loop simulation in virtual-reality environments. Thanks to its versatility, we believe that Agilicious supports the next generation of scientific and industrial quadrotor research. For more details check our paper, video and webpage.

January 17, 2023

Event-based Shape from Polarization

We introduce a novel shape-from-polarization technique using an event camera. Our setup consists of a linear polarizer rotating at high-speeds in front of an event camera. Our method uses the continuous event stream caused by the rotation to reconstruct relative intensities at multiple polarizer angles. Experiments demonstrate that our method outperforms physics-based baselines using frames, reducing the MAE by 25% in synthetic and real-world dataset. For more information, have a look at our paper.

January 11, 2023

Survey on Autonomous Drone Racing

We present our survey on Autonomous Drone Racing which covers the latest developments in agile flight for both model based and learning based approaches. We include extensive coverage of drone racing competitions, simulators, open source software, and the state of the art approaches for flying autonomous drones at their limits! For more information, see our paper

January 10, 2023

4th International Workshop on Event-Based Vision at CVPR 2023

The event will take place on June 19, 2023 in Vancouver, Canada. The deadline to submit a paper contribution is March 20 via CMT. More info on our website. The event is co-organized by Guillermo Gallego, Davide Scaramuzza, Kostas Daniilidis, Cornelia Femueller, Davide Migliore.

January 04, 2023

Davide Scaramuzza featured author of IEEE

We are honored that Davide Scaramuzza is featured authors on the IEEE website.

Older News

Video Highlights

September 13, 2023

The fundamental advantage of reinforcement learning over optimal control lies in its optimization objective.

September 1, 2023

Our AI Drone beats human world champion pilots in drone racing, while only relying on onboard sensing and compute!

December 1, 2022

RPG celebrates its 10th anniversary!

October 28, 2022

The Robotics and Perception Group participated in the parabolic flight campain of UZH Space Hub to study how gravity affects the decision-making of human drone pilots.

October 14, 2022

The first Data-Efficient Collaborative Decentralized Thermal-Inertial Odometry system has been released as open source, extending the already-public JPL xVIO library. Checkout the code and datasets, to discover how a drone swarm can collaborate in all types of light conditions.

July 13, 2022

Our lab is featured on the Italian RAI1 TV program SuperQuark. Watch the full video report about our research on autonomous drones, from drone racing to search and rescue, from standard to event cameras. The video is in Italian with English subtitles.

July 1, 2022

We are excited to announce our RA-L paper which tackles minimum-time flight in cluttered environments using a combination of deep reinforcement learning and classical topological path planning. We show that the approach outperforms the state-of-the-art in both planning quality and the ability to fly without collisions at high speeds. For more details, check out the paper and the YouTube.

June 26, 2022

For the first time, a time-optimal trajectory can be generated and tracked in real-time, even with moving waypoints and strong unknown disturbances! Read our Time-optimal Online Replanning for Agile Quadrotor Flight paper and watch our IROS talk for further details.

June 13, 2022

We are excited to announce that our paper on Time Lens++ was accepted at CVPR 2022. To learn more about the next generation of event-based frame interpolation visit out project page There we release our new dataset BS-ERGB recorded with a beam splitter, which features aligned and synchronized events and frames."

October 6, 2021

We train a high-speed navigation policy in simulation and deploy it on real drones in previously unknown, extremely challenging environments up to 40km/h (Switzerland is a great location for this!). The approach relies only on onboard vision and computation. Checkout our Science Robotics paper Learning High-Speed Flight in the Wild for further details.

September 10, 2021

We propose L1-NMPC, a novel hybrid adaptive NMPC to learn model uncertainties online and immediately compensate for them, drastically improving performance over non-adaptive baselines with minimal computational overhead. Our proposed architecture generalizes to many different environments from which we evaluate wind, unknown payloads, and highly agile flight conditions. Performance, Precision, and Payloads: Adaptive Nonlinear MPC for Quadrotors for further details.

September 9, 2021

In this work, we perform extensive experimental studies to quantitively compare two state-of-the-art control methods for quadrotor agile flight, from the aspect of trajectory tracking accuracy, robustness, and computational efficiency. A Comparative Study of Nonlinear MPC and Differential-Flatness-Based Control for Quadrotor Agile Flight paper for further details.

September 8, 2021

Thanks to our Model Predictive Contouring Control, the problem of flying through multiple waypoints in minimum time can now be solved in real-time. Read our Model Predictive Contouring Control for Time-Optimal Quadrotor Flight paper for further details.

June 28, 2021

AI Drone faster than Humans? Time-Optimal Planning for Quadrotor Waypoint Flight. Read our Time-optimal planning for quadrotor waypoint flight paper for further details.

June 28, 2021

The Robotics and Perception Group and the University of Zurich present one of the world's largest indoor drone-testing arenas. - Equipped with a real-time motion-capture system consisting of 36 Vicon cameras, and with a flight space of over 30x30x8 meters (7,000 cubic meters), this large research infrastructure allows us to deploy our most advanced perception, learning, planning, and control algorithms to push vision-based agile drones to speeds over 60 km/h and accelerations over 5g.

June 28, 2021

NeuroBEM is a framework that that allows simulation of very aggressive quadrotor flights with unprecedented precision. Learn more about our machine-learning augmented first-principles method at our project page. We also release a dataset that contains high-speed quadrotor flight data.

June 11, 2021

TimeLens is a new event-based video frame interpolation method that generates high speed video from low framerate RGB frames and asynchronous events. Learn more about TimeLens over at our project page where you can find code, datasets and more! We also release a High-Speed Event and RGB dataset which features complex scenarios like bursting balloons and spinning objects!

April 14, 2021

DSEC is a new stereo event camera dataset: over 400 GB of data, 53 sequences, 2 VGA event cameras, 2 RGB global shutter cameras, 53 sequences, day and night, urban and mountain driving, accurate calibration, disparity groundtruth from Lidar.

March 18, 2021

Watch our quadrotor flies near-time-optimal trajectories in Flightmare and the real world using Reinforcement Learnig! Read our Preprint for further details.

Jan 13, 2021

Watch our quadrotor flies after motor failure with only onboard vision sensors! Read our RA-L paper for further details.