The Sum-Product Theorem and its Applications
MetadataShow full item record
Models in artificial intelligence (AI) and machine learning (ML) must be expressive enough to accurately capture the state of the world, but tractable enough that reasoning and inference within them is feasible. However, many standard models are incapable of capturing sufficiently complex phenomena when constrained to be tractable. In this dissertation, I study the cause of this inexpressiveness and its relationship to inference complexity. I use the resulting insights to develop more efficient and expressive models and algorithms for many problems in AI and ML, including nonconvex optimization, computer vision, and deep learning. I first identify and prove the sum-product theorem, which states that in any semiring for inference to be tractable it suffices that the factors of every product have disjoint scopes; i.e., that they are decomposable. I show that this simple condition unifies and extends many results in the literature and enables the definition of highly-expressive model classes that are tractable and learnable for many of the most important problems in AI and ML. Second, I develop RDIS, a novel nonconvex optimization algorithm based on the sum-product theorem. I show both analytically and empirically that RDIS can be exponentially faster than standard approaches because it finds and exploits local decomposability. Third, I combine decomposability with submodularity to define submodular field grammars (SFGs), a novel class of probabilistic models that extends both sum-product networks and submodular Markov random fields. SFGs define a novel stochastic image grammar in which each object in the grammar can have arbitrary region shape but in which approximate MAP inference remains tractable, the first image grammar formulation in which this is possible. Finally, I demonstrate the applicability of decomposability to deep learning. I present feasible target propagation, a novel algorithm for learning deep neural networks with hard-threshold activations – which cannot be trained with standard backpropagation-based methods – that learns more accurate models than competing methods.