Visual and Inertial Odometry

Smart Interest Points


Detecting interest points is a key component of vision-based estimation algorithms, such as visual odometry or visual SLAM. In the context of distributed visual SLAM, we have encountered the need to minimize the amount of data that is sent between robots, which, for relative pose estimation, translates into the need to find a minimum set of interest points that is sufficiently reliably detected between viewpoints to ensure relative pose estimation. We have decided to solve this problem at a fundamental level, that is, at the point detector, using machine learning.

In SIPS, we introduce the succinctness metric, which allows to quantify performance of interest point detectors with respect to this goal. At the same time, we propose an unsupervised training method for CNN interest point detectors which requires no labels - only uncalibrated image sequences. The proposed method is able to establish relative poses with a minimum of extracted interest points. However, descriptors still need to be extracted and transmitted to establish these poses.

This problem is addressed in IMIPs, where we propose the first feature matching pipeline that works by implicit matching, without the need of descriptors. In IMIPs, the detector CNN has multiple output channels, and each channel generates a single interest point. Between viewpoints, interest points obtained from the same channel are considered implicitly matched. This allows matching points with as little as 3 bytes per point - the point coordinates in an up to 4096 x 4096 image.


References

IMIPs

T. Cieslewski, M. Bloesch, D. Scaramuzza

Matching Features without Descriptors:
Implicitly Matched Interest Points

British Machine Vision Conference (BMVC), Cardiff, 2019.

PDF Poster Code and Data


SIPs: Succinct Interest Points from Unsupervised Inlierness Probability Learning

T. Cieslewski, K. G. Derpanis, D. Scaramuzza

SIPs: Succinct Interest Points from Unsupervised Inlierness Probability Learning

IEEE International Conference on 3D Vision (3DV), 2019.

PDF Poster YouTube Code and Data


Visual-Inertial Odometry of Aerial Robotics

encyclopedia_vio
Visual-Inertial odometry (VIO) is the process of estimating the state (pose and velocity) of an agent (e.g., an aerial robot) by using only the input of one or more cameras plus one or more Inertial Measurement Units (IMUs) attached to it. VIO is the only viable alternative to GPS and lidar-based odometry to achieve accurate state estimation. Since both cameras and IMUs are very cheap, these sensor types are ubiquitous in all today's aerial robots.

References

encyclopedia19_scaramuzza

D. Scaramuzza, Z. Zhang

Visual-Inertial Odometry of Aerial Robots

Encyclopedia of Robotics, Springer, 2019

PDF


Probabilistic, Continuous-Time Trajectory Evaluation for SLAM

Trajectory Evaluation
Despite the existence of different error metrics for trajectory evaluation in SLAM, their theoretical justifications and connections are rarely studied, and few methods handle temporal association properly. In this work, we propose to formulate the trajectory evaluation problem in a probabilistic, continuous-time framework. By modeling the groundtruth as random variables, the concepts of absolute and relative error are generalized to be likelihood. Moreover, the groundtruth is represented as a piecewise Gaussian Process in continuous-time. Within this framework, we are able to establish theoretical connections between relative and absolute error metrics and handle temporal association in a principled manner.

References

WICRA19_Zhang

Z. Zhang, D. Scaramuzza

Rethinking Trajectory Evaluation for SLAM: a Probabilistic, Continuous-Time Approach

ICRA19 Workshop on Dataset Generation and Benchmarking of SLAM Algorithms for Robotics and VR/AR

Best Paper Award!

PDF


Visual Inertial Model-based Odometry and Force Estimation

VIMO: Simultaneous Visual Inertial Model-based Odometry and Force Estimation
In recent years, many approaches to Visual Inertial Odometry (VIO) have become available. However, they neither exploit the robot's dynamics and known actuation inputs, nor differentiate between desired motion due to actuation and unwanted perturbation due to external force. For many robotic applications, it is often essential to sense the external force acting on the system due to, for example, interactions, contacts, and disturbances. Adding a motion constraint to an estimator leads to a discrepancy between the model-predicted motion and the actual motion. Our approach exploits this discrepancy and resolves it by simultaneously estimating the motion and the external force. We propose a relative motion constraint combining the robot's dynamics and the external force in a preintegrated residual, resulting in a tightly-coupled, sliding-window estimator exploiting all correlations among all variables. We implement our Visual Inertial Model-based Odometry (VIMO) system into a state-of-the-art VIO approach and evaluate it against the original pipeline without motion constraints on both simulated and real-world data. The results show that our approach increases the accuracy of the estimator up to 29\% compared to the original VIO, and provides external force estimates at no extra computational cost. To the best of our knowledge, this is the first approach exploiting model dynamics by jointly estimating motion and external force.

References

ICRA19_Zhang

B. Nisar, P. Foehn, D. Falanga, D. Scaramuzza

VIMO: Simultaneous Visual Inertial Model-based Odometry and Force Estimation

Robotics: Science and Systems (RSS), Freiburg, 2019

PDF


Fisher Information Field for Active Visual Localization

For mobile robots to localize robustly, actively considering the perception requirement at the planning stage is essential. In this paper, we propose a novel representation for active visual localization. By formulating the Fisher information and sensor visibility carefully, we are able to summarize the localization information into a discrete grid, namely the Fisher information field. The information for arbitrary poses can then be computed from the field in constant time, without the need of costly iterating all the 3D landmarks. Experimental results on simulated and real-world data show the great potential of our method in efficient active localization and perception- aware planning. To benefit related research, we release our implementation of the information field to the public.

References

ICRA19_Zhang

Z. Zhang, D. Scaramuzza

Beyond Point Clouds: Fisher Information Field for Active Visual Localization

IEEE International Conference on Robotics and Automation, 2019.

PDF Video Code (Coming soon)


A Tutorial on Quantitative Trajectory Evaluation for Visual(-Inertial) Odometry

Trajectory Evaluation
In this tutorial, we provide principled methods to quantitatively evaluate the quality of an estimated trajectory from visual(-inertial) odometry (VO/VIO), which is the foundation of benchmarking the accuracy of different algorithms. First, we show how to determine the transformation type to use in trajectory alignment based on the specific sensing modality (i.e., monocular, stereo and visual-inertial). Second, we describe commonly used error metrics (i.e., the absolute trajectory error and the relative error) and their strengths and weaknesses. To make the methodology presented for VO/VIO applicable to other setups, we also generalize our formulation to any given sensing modality. To facilitate the reproducibility of related research, we publicly release our implementation of the methods described in this tutorial.

References

Trajectory Evaluation

Z. Zhang, D. Scaramuzza

A Tutorial on Quantitative Trajectory Evaluation for Visual(-Inertial) Odometry

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, 2018.

PDF PPT VO/VIO Evaluation Toolbox


On the Comparison of Gauge Freedom Handling in Optimization-based Visual-Inertial State Estimation

Gauge Comparison
It is well known that visual-inertial state estimation is possible up to a four degrees-of-freedom (DoF) transformation (rotation around gravity and translation), and the extra DoFs ("gauge freedom") have to be handled properly. While different approaches for handling the gauge freedom have been used in practice, no previous study has been carried out to systematically analyze their differences. In this paper, we present the first comparative analysis of different methods for handling the gauge freedom in optimization-based visual-inertial state estimation. We experimentally compare three commonly used approaches: fixing the unobservable states to some given values, setting a prior on such states, or letting the states evolve freely during optimization. Specifically, we show that (i) the accuracy and computational time of the three methods are similar, with the free gauge approach being slightly faster; (ii) the covariance estimation from the free gauge approach appears dramatically different, but is actually tightly related to the other approaches. Our findings are validated both in simulation and on real-world datasets and can be useful for designing optimization-based visual-inertial state estimation algorithms.

References

Gauge Comparison

Z. Zhang, G, Gallego, D. Scaramuzza

On the Comparison of Gauge Freedom Handling in Optimization-based Visual-Inertial State Estimation

IEEE Robotics and Automation Letters (RA-L), 2018.

PDF PPT Code


Visual-Inertial Odometry Benchmarking

Flying robots require a combination of accuracy and low latency in their state estimation in order to achieve stable and robust flight. However, due to the power and payload constraints of aerial platforms, state estimation algorithms must provide these qualities under the computational constraints of embedded hardware. Cameras and inertial measurement units (IMUs) satisfy these power and payload constraints, so visual-inertial odometry (VIO) algorithms are popular choices for state estimation in these scenarios, in addition to their ability to operate without external localization from motion capture or global positioning systems. It is not clear from existing results in the literature, however, which VIO algorithms perform well under the accuracy, latency, and computational constraints of a flying robot with onboard state estimation. This paper evaluates an array of publicly-available VIO pipelines (MSCKF, OKVIS, ROVIO, VINS-Mono, SVO+MSF, and SVO+GTSAM) on different hardware configurations, including several single-board computer systems that are typically found on flying robots. The evaluation considers the pose estimation accuracy, per-frame processing time, and CPU and memory load while processing the EuRoC datasets, which contain six degree of freedom (6DoF) trajectories typical of flying robots. We present our complete results as a benchmark for the research community.

References

A Benchmark Comparison of Monocular VIO Algorithms for Flying Robots

J. Delmerico, D. Scaramuzza

A Benchmark Comparison of Monocular Visual-Inertial Odometry Algorithms for Flying Robots

IEEE International Conference on Robotics and Automation (ICRA), 2018.

PDF Video PPT


Active Exposure Control for Robust Visual Odometry in High Dynamic Range (HDR) Environments

In this paper, we propose an active exposure control method to improve the robustness of visual odometry in HDR (high dynamic range) environments. Our method evaluates the proper exposure time by maximizing a robust gradient-based image quality metric. The optimization is achieved by exploiting the photometric response function of the camera. Our exposure control method is evaluated in different real world environments and outperforms both the built-in auto-exposure function of the camera and a fixed exposure time. To validate the benefit of our approach, we test different state-of-the-art visual odometry pipelines (namely, ORB-SLAM2, DSO, and SVO 2.0) and demonstrate significant improved performance using our exposure control method in very challenging HDR environments. Datasets and code will be released soon!

References

ICRA17_Zhang

Z. Zhang, C. Forster, D. Scaramuzza

Active Exposure Control for Robust Visual Odometry in HDR Environments

IEEE International Conference on Robotics and Automation (ICRA), 2017.

PDF YouTube


IMU Preintegration on Manifold for Efficient Visual-Inertial Maximum-a-Posteriori Estimation

Recent results in monocular visual-inertial navigation (VIN) have shown that optimization-based approaches outperform filtering methods in terms of accuracy due to their capability to relinearize past states. However, the improvement comes at the cost of increased computational complexity. In this paper, we address this issue by preintegrating inertial measurements between selected keyframes. The preintegration allows us to accurately summarize hundreds of inertial measurements into a single relative motion constraint. Our first contribution is a preintegration theory that properly addresses the manifold structure of the rotation group and carefully deals with uncertainty propagation. The measurements are integrated in a local frame, which eliminates the need to repeat the integration when the linearization point changes while leaving the opportunity for belated bias corrections. The second contribution is to show that the preintegrated IMU model can be seamlessly integrated in a visual-inertial pipeline under the unifying framework of factor graphs. This enables the use of a structureless model for visual measurements, further accelerating the computation. The third contribution is an extensive evaluation of our monocular VIN pipeline: experimental results confirm that our system is very fast and demonstrates superior accuracy with respect to competitive state-of-the-art filtering and optimization algorithms, including off-the-shelf systems such as Google Tango.

References

RSS15_Forster

C. Forster, L. Carlone, F. Dellaert, D. Scaramuzza

On-Manifold Preintegration for Real-Time Visual-Inertial Odometry

IEEE Transactions on Robotics, in press, 2016.

PDF YouTube


RSS2015_Forster

C. Forster, L. Carlone, F. Dellaert, D. Scaramuzza

IMU Preintegration on Manifold for Efficient Visual-Inertial Maximum-a-Posteriori Estimation

Robotics: Science and Systems (RSS), Rome, 2015.

Best Paper Award Finalist! Oral Presentation: Acceptance Rate 4%

PDF Supplementary material YouTube


SVO: Fast Semi-Direct Monocular Visual Odometry


We propose a semi-direct monocular visual odometry algorithm that is precise, robust, and faster than current state-of-the-art methods. The semi-direct approach eliminates the need of costly feature extraction and robust matching techniques for motion estimation. Our algorithm operates directly on pixel intensities, which results in subpixel precision at high frame-rates. A probabilistic mapping method that explicitly models outlier measurements is used to estimate 3D points, which results in fewer outliers and more reliable points. Precise and high frame-rate motion estimation brings increased robustness in scenes of little, repetitive, and high-frequency texture. The algorithm is applied to micro-aerial-vehicle stateestimation in GPS-denied environments and runs at 55 frames per second on the onboard embedded computer and at more than 300 frames per second on a consumer laptop.


This video shows results from a modification of the SVO algorithm that generalizes to a set of rigidly attached (not necessarily overlapping) cameras. Simultaneously, we run a CPU implementation of the REMODE algorithm on the front, left, and right camera. Everything runs in real-time on a laptop computer. Parking garage dataset courtesy of NVIDIA.

References

TRO17_Forster-SVO

Christian Forster, Zichao Zhang, Michael Gassner, Manuel Werlberger, Davide Scaramuzza

SVO: Semi-Direct Visual Odometry for Monocular and Multi-Camera Systems

IEEE Transactions on Robotics, Vol. 33, Issue 2, pages 249-265, Apr. 2017.

Includes comparison against ORB-SLAM, LSD-SLAM, and DSO and comparison among Dense, Semi-dense, and Sparse Direct Image Alignment.

PDF YouTube Binaries Download


ICRA2014_Forster

C. Forster, M. Pizzoli, D. Scaramuzza

SVO: Fast Semi-Direct Monocular Visual Odometry

IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, 2014.

PDF YouTube Software SVO 2.0 Binaries Download


ICRA2014_Pizzoli

M. Pizzoli, C. Forster, D. Scaramuzza

REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time

IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, 2014.

PDF YouTube


1-point RANSAC

Given a car equipped with an omnidirectional camera, the motion of the vehicle can be purely recovered from salient features tracked over time. We propose the 1-Point RANSAC algorithm which runs at 800 Hz on a normal laptop. To our knowledge, this is the most efficient visual odometry algorithm.



This video shows the estimation of the vehicle motion from image features. The video demonstrate the approach described in our paper which uses 1-point RANSAC algorithm to remove the outliers. Except for the features extraction process, the outlier removal and the motion estimation steps take less than 1 ms on a normal laptop computer.

References

D. Scaramuzza and F. Fraundorfer. Visual Odometry: Part I - The First 30 Years and Fundamentals. IEEE Robotics and Automation Magazine, Volume 18, issue 4, 2011. [ PDF ]
F. Fraundorfer and D. Scaramuzza. Visual odometry: Part II - Matching, robustness, optimization, and applications. IEEE Robotics and Automation Magazine, Volume 19, issue 2, 2012. [ PDF ]
D. Scaramuzza. 1-Point-RANSAC Structure from Motion for Vehicle-Mounted Cameras by Exploiting Non-holonomic Constraints. International Journal of Computer Vision, Volume 95, Issue 1, 2011. [ PDF ]
D. Scaramuzza. Performance Evaluation of 1-Point-RANSAC Visual Odometry. Journal of Field Robotics, Volume 28, issue 5, 2011. PDF ]
D. Scaramuzza, A. Censi, K. Daniilidis. Exploiting Motion Priors in Visual Odometry for Vehicle-Mounted Cameras with Non-holonomic Constraints. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), San Francisco, September, 2011. [ PDF ]
L. Kneip, D. Scaramuzza, R. Siegwart. A Novel Parameterization of the Perspective-Three-Point Problem for a Direct Computation of Absolute Camera Position and Orientation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, USA, 2011. [ PDF ] [C/C++ code]
L. Kneip, A. Martinelli, S. Weiss, D. Scaramuzza, R. Siegwart. A Closed-Form Solution for Absolute Scale Velocity Determination Combining Inertial Measurements and a Single Feature Correspondence. IEEE International Conference on Robotics and Automation (ICRA 2011), Shanghai, 2011. [ PDF ]
D. Scaramuzza, F. Fraundorfer, and M. Pollefeys. Closing the Loop in Appearance-Guided Omnidirectional Visual Odometry by Using Vocabulary Trees. Robotics and Autonomous System Journal (Elsevier), Volume 58, issue 6, June, 2010. [ PDF ]
L. Kneip, D. Scaramuzza, R. Siegwart. On the Initialization of Statistical Optimum Filters with Application to Motion Estimation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2010), Taipei, October, 2010. [ PDF ]
F. Fraundorfer, D. Scaramuzza, M. Pollefeys. A Constricted Bundle Adjustment Parameterization for Relative Scale Estimation in Visual Odometry. IEEE International Conference on Robotics and Automation (ICRA 2010), Anchorage, Alaska, May, 2010. [ PDF ]
D. Scaramuzza, L. Spinello, R. Triebel, R., Siegwart. Key Technologies for Intelligent and Safer Cars from Motion Estimation to Predictive Motion Planning. IEEE International Conference on Industrial Electronics, Bari, Italy, July, 2010. [ PDF ]
D. Sabatta, D. Scaramuzza, R. Siegwart. Improved Appearance-Based Matching in Similar and Dynamic Environments Using a Vocabulary Tree. IEEE International Conference on Robotics and Automation (ICRA 2010), Anchorage, Alaska, May, 2010. [ PDF ]
D. Scaramuzza, F. Fraundorfer, M. Pollefeys, R. Siegwart. Absolute Scale in Structure from Motion from a Single Vehicle Mounted Camera by Exploiting Nonholonomic Constraints. IEEE International Conference on Computer Vision (ICCV 2009), Kyoto, September-October, 2009. [ PDF ]
D. Scaramuzza, F. Fraundorfer, R. Siegwart. Real-Time Monocular Visual Odometry for On-Road Vehicles with 1-Point RANSAC. IEEE International Conference on Robotics and Automation (ICRA 2009), Kobe, Japan, May, 2009. [ PDF ]
D. Scaramuzza, R. Siegwart. Appearance-Guided Monocular Omnidirectional Visual Odometry for Outdoor Ground Vehicles. IEEE Transactions on Robotics, Volume 24, issue 5, October 2008. [ PDF ]

Robot Localization Using Soft Object Detection

Most of the work done in localization, mapping, and navigation for both ground and aerial vehicles has been done by means of point landmarks or occupancy grids, using vision or laser range finders. However, to make these robots one day able to cooperate with humans in complex scenarios, we need to build semantic maps of the environment. In this work we address map-based localization using "soft" object detection. Soft object detection differs from "hard" object detection in that we do not extract an "affirmative/negative" response about the presence of the object but rather we compute, for each pixel in the current frame, the probability that the object under consideration is there. This gives raise to many false positive (see the multiple peaks in the object "heat-map") that are disambiguated during motion by the particle filter.

References

R. Anati, D. Scaramuzza, K. Derpanis, K. Daniilidis.

Robot Localization Using Soft Object Detection

IEEE International Conference on Robotics and Automation (ICRA), St. Paul, 2012.

PDF