Unsupervised Moving Object Detection

via Contextual Information Separation

We propose an adversarial contextual model for detecting moving objects in images. A deep neural network is trained to predict the optical flow in a region using information from everywhere else but that region (context), while another network attempts to make such context as uninformative as possible. The result is a model where hypotheses naturally compete with no need for explicit regularization or hyper-parameter tuning. Although our method requires no supervision whatsoever, it outperforms several methods that are pre-trained on large annotated datasets. Our model can be thought of as a generalization of classical variational generative region-based segmentation, but in a way that avoids explicit regularization or solution of partial differential equations at run-time.


Unsupervised Moving Object Detection via Contextual Information Separation

Y.Yang*, A. Loquercio*, D. Scaramuzza, S. Soatto

Unsupervised Moving Object Detection via Contextual Information Separation

Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

PDF Video Project Code


We formulate motion segmentation in an adversarial fashion: The motion of an object and of the background are uncorrelated, therefore they can't be recovered from one another.

Our training procedure is composed of two actors, a generator G and an impainter I. The generator G tries to compute a mask of the optical flow such that the impainter I would do the worst possible job in recovering the original optical flow from the masked one.


Project Code

We publicly release our training and testing code. Code documentation available in the GitHub folder.

Precomputed Results [4MB]

This archive contains the object detection masks obtained by our method after post-processing for the DAVIS 2016, FBMS59 and SegTrackV2 datasets.

Pretrained Models [215MB]

This archive contains the trained weights of the models we used for our experiments.