Learning Robust Tractable Models for Vision
Gens, Robert Corey
MetadataShow full item record
Human vision is a demanding computation that acts on and learns from billions of moving measurements every second. Computer vision requires models that are both tractable for realtime learning and inference as well as robust to the transformations of the visual world. For a vision system to benefit an embodied agent it must be able to (a) learn tractable models discriminatively so that it does not waste computation on nonessential questions, (b) learn model structure so computation is only added where needed, (c) learn from images subject to transformations, and (d) learn new concepts quickly. In this dissertation we tackle these four desiderata, weaving together sum-product networks, neural networks, kernel machines, and symmetry group theory. First, we extend sum-product networks so that they can be trained discriminatively. This expands the space of SPN architectures and allows feature functions, making them a compelling tractable alternative to conditional random fields. We show that discriminative SPNs can be competitive with deep models on image classification. Second, we present an algorithm to learn the structure of sum-product networks. The top-down recursive algorithm builds a product if it can decompose variables and otherwise a sum to cluster instances. Surprisingly, this algorithm learns SPNs with superior inference accuracy and speed compared to probabilistic graphical models on a large number of datasets. Third, we introduce deep symmetry networks that can learn representations over arbitrary Lie groups. We present techniques to scale these networks to high-dimensional symmetries. We show that deep symmetry networks can classify 2D and 3D transformed objects with higher accuracy and less training data than convolutional neural networks. Finally, we propose compositional kernel machines as an instance-based learner that has the symmetry and compositionality of convolutional neural networks but is significantly easier to train. We combat the curse of dimensionality by effectively summing over an exponential set of constructed virtual training instances using a sum-product function. This makes CKMs outperform standard instance-based learners on image classification and generalizing to unseen compositions and symmetries.