Deep Learning

Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data. In our research, we apply deep learning to solve different mobile robot navigation problems, such as depth estimation, end-to-end navigation, and classification.


Place Recognition in Semi-Dense Maps: Geometric and Learning-Based Approaches

For robotics and augmented reality systems operating in large and dynamic environments, place recognition and tracking using vision represent very challenging tasks. Additionally, when these systems need to reliably operate for very long time periods, such as months or years, further challenges are introduced by severe environmental changes, that can significantly alter the visual appearance of a scene. Thus, to unlock long term, large scale visual place recognition, it is necessary to develop new methodologies for improving localization under difficult conditions. As shown in previous work, gains in robustness can be achieved by exploiting the 3D structural information of a scene. The latter, extracted from image sequences, carries in fact more discriminative clues than individual images only. In this paper, we propose to represent a scene's structure with semi-dense point clouds, due to their highly informative power, and the simplicity of their generation through mature visual odometry and SLAM systems. Then we cast place recognition as an instance of pose retrieval and evaluate several techniques, including recent learning based approaches, to produce discriminative descriptors of semi-dense point clouds. Our proposed methodology, evaluated on the recently published and challenging Oxford Robotcar Dataset, shows to outperform image-based place recognition, with improvements up to 30% in precision across strong appearance changes. To the best of our knowledge, we are the first to propose place recognition in semi-dense maps.


References

BMVC17_Ye

Y. Ye, T. Cieslewski, A. Loquercio, D. Scaramuzza

Place Recognition in Semi-Dense Maps: Geometric and Learning-Based Approaches

British Machine Vision Conference (BMVC), London, 2017.

PDF


Learning-based Image Enhancement for Visual Odometry in Challenging HDR Environments

One of the main open challenges in visual odometry (VO) is the robustness to difficult illumination conditions or high dynamic range (HDR) environments. The main difficulties in these situations come from both the limitations of the sensors and the inability to perform a successful tracking of interest points because of the bold assumptions in VO, such as brightness constancy. We address this problem from a deep learning perspective, for which we first fine-tune a Deep Neural Network (DNN) with the purpose of obtaining enhanced representations of the sequences for VO. Then, we demonstrate how the insertion of Long Short Term Memory (LSTM) allows us to obtain temporally consistent sequences, as the estimation depends on previous states. However, the use of very deep networks does not allow the insertion into a real-time VO framework; therefore, we also propose a Convolutional Neural Network (CNN) of reduced size capable of performing faster. Finally, we validate the enhanced representations by evaluating the sequences produced by the two architectures in several state-of-art VO algorithms, such as ORB-SLAM and DSO.


References

arXiv17_Gomez

R. Gomez-Ojeda, Z. Zhang, J. Gonzalez-Jimenez, D. Scaramuzza

Learning-based Image Enhancement for Visual Odometry in Challenging HDR Environments

(Under review)

PDF (arXiv) YouTube


Towards Domain Independence for Learning-Based Monocular Depth Estimation

Most state-of-the-art learning-based monocular depth depth estimators do not consider generalization and only benchmark their performance on publicly available datasets "only after specific fine tuning". Generalization can be achieved by training on several heterogeneous datasets but their collection and labeling is costly. In this work, we propose two Deep Neural Networks (one based on CNN and one on LSTM) for monocular depth estimation, which we train on heterogeneous synthetic datasets (forest and urban scenarios), generated using Unreal Engine, and show that, although trained only on synthetic data, the network is able to generalize well across different, unseen real-world scenarios (KITTI and new collected datasets from Zurich, Switzerland, and Perugia, Italy) without any fine-tuning, achieving comparable performance to state-of-the-art methods. Additionally, we also show that the LSTM network is able to estimate well the absolute scale with low additional computational overhead. We release the Unreal Engine 3D models and all the collected datasets (from Switzerland and Italy) freely to the public.


References

RAL17_Mancini

M. Mancini, G. Costante, P. Valigi, T.A. Ciarfuglia, J. Delmerico, D. Scaramuzza

Towards Domain Independence for Learning-Based Monocular Depth Estimation

IEEE Robotics and Automation Letters (RA-L), 2017.

PDF YouTube Dataset and Unreal-Engine 3D models


A Deep Learning Approach for Automatic Recognition and Following of Forest Trails with Drones

We study the problem of perceiving forest or mountain trails from a single monocular image acquired from the viewpoint of a robot traveling on the trail itself. Previous literature focused on trail segmentation, and used low-level features such as image saliency or appearance contrast; we propose a different approach based on a Deep Neural Network used as a supervised image classifier. By operating on the whole image at once, our system outputs the main direction of the trail compared to the viewing direction. Qualitative and quantitative results computed on a large real-world dataset (which we provide for download) show that our approach outperforms alternatives, and yields an accuracy comparable to the accuracy of humans that are tested on the same image classification task. Preliminary results on using this information for quadrotor control in unseen trails are reported. To the best of our knowledge, this is the first paper that describes an approach to perceive forest trials which is demonstrated on a quadrotor micro aerial vehicle.

References

RAL16_Giusti

A. Giusti, J. Guzzi, D.C. Ciresan, F. He, J.P. Rodr�guez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. Di Caro, D. Scaramuzza, L.M. Gambardella

A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots

IEEE Robotics and Automation Letters (RA-L), pages 661 - 667, 2016

Nominated for AAAI Best Video Award!

PDF Project Webpage and Datasets DOI YouTube


"On-the-spot Training" for Terrain Classification in Autonomous Air-Ground Collaborative Teams

We consider the problem of performing rapid training of a terrain classi.er in the context of a collaborative robotic search and res- cue system. Our system uses a vision-based flying robot to guide a ground robot through unknown terrain to a goal location by building a map of terrain class and elevation. However, due to the unknown environments present in search and rescue scenarios, our system requires a terrain classifier that can be trained and deployed quickly, based on data col- lected on the spot. We investigate the relationship of training set size and complexity on training time and accuracy, for both feature-based and convolutional neural network classi.ers in this scenario. Our goal is to minimize the deployment time of the classi.er in our terrain mapping system within acceptable classi.cation accuracy tolerances. So we are not concerned with training a classi.er that generalizes well, only one that works well for this particular environment. We demonstrate that we can launch our aerial robot, gather data, train a classi.er, and begin building a terrain map after only 60 seconds of flight.

References

On-the-spot training

J. Delmerico, A. Giusti, E. Mueggler, L.M. Gambardella, D. Scaramuzza

"On-the-spot Training" for Terrain Classification in Autonomous Air-Ground Collaborative Teams

International Symposium on Experimental Robotics (ISER), Tokyo, 2016.

PDF YouTube