Model-Based Self-Supervision for Fine-Grained Image Understanding
MetadataShow full item record
In order to enable robots that perform tasks capably, efficiently, and safely in dynamic environments, we'll need vision systems that are adaptable and can reliably provide detailed and accurate information about the world around the robot. Generative model-based methods are well suited to these demands, as generative models are portable (they are equally valid in many different environments), modular (they can be dynamically combined with other models), and interpretable (via likelihood functions and explicitly parameterized state spaces). For robotics applications, these properties give model-based vision an advantage over the currently dominant deep learning paradigm in computer vision. However, generative models are limited by the curse of dimensionality and data association ambiguity, and are thus plagued by a lack of robustness which is an impediment to successful deployment in robotic systems. This dissertation advances the argument that the strengths and weaknesses of generative model-based vision and discriminative learning-based vision are largely complementary, and can therefore be used to mutual advantage. Specifically, we'll show that generative models can be employed to automate the process of labeling data for supervised training of deep neural networks, demonstrating the benefit to deep learning offered by model-based vision. Then, we'll show that deep neural networks trained to recognize parts of generative models can be used to help resolve ambiguous data association and thus enhance the robustness of state estimation, demonstrating the benefits deep learning can bring back to model-based vision. Overall, we lay out a vision for future development of highly capable robot perception systems in which machine learning expands the envelope of situations in which generative model-based techniques are reliably applicable, and the generative model-based techniques return the favor by providing labels for further training of the network.