Representation Learning for Partitioning Problems
MetadataShow full item record
This dissertation addresses representation learning for partitioning problems. Clustering a set of data points and segmenting a time series of data points are two classical partitioning problems. Nonparametric methods such as kernel-based methods assume the knowledge of a mapping into a feature space. Their statistical performance can, however, be impeded if this mapping, usually called a feature representation, is improperly specified or simply unknown. As larger datasets become available we can contemplate the possibility of, jointly, learning a feature representation and predicting clustering or segmentation labels. The feature representations we consider here take the form of a nonlinear mapping built as a composition of basic modules, a computational skeleton commonly referred to as a deep network. The parameters of each module are learned from data using automatic differentiation techniques and gradient-based optimization algorithms. The combination of these allows us to tackle partitioning problems using end-to-end learning with gradient-based training algorithms. As a consequence, the proposed methods can improve upon classical kernel-based methods in terms of statistical performance as more data is used. The first half of this dissertation demonstrates this approach for learning feature representations for image categorization and clustering using convolutional kernel-based methods. The second half develops methods that broaden the scope of multiple change-point estimation in time series of data points using a feature representation mapping built from a data-dependent kernel mapping or a neural network. Chapter 2 explores the relationship between convolutional neural networks and related kernel-based convolutional networks. We show how to transform a neural network to a kernel network, providing a detailed mathematical description of each component of a network. We explore this comparison both analytically and empirically with milestone convolutional network architectures for image categorization and highlight the similarities and the differences between these two families of methods. Along the way we propose a gradient-based optimization method for training both neural and kernel networks. Chapter 3 switches to learning feature representations in the presence of unlabeled data. We propose a single objective function that transitions between the supervised and unsupervised settings depending on the ratio of labeled to unlabeled data, recovering discriminative clustering at one end and supervised classification at the other end. We put in perspective the proposed method in the broader area of nonparametric similarity-based clustering methods and motivate the proposed objective for end-to-end learning. We propose to perform the clustering assignment using an entropy-regularized optimal transport algorithm. A numerical evaluation on several datasets demonstrates the interest of the approach. Inspired by Chapter 3, Chapter 4 pivots to learning feature representations for change-point estimation. Similarly to Chapter 3, we develop a single objective function that can handle any ratio of labeled to partially labeled to unlabeled sequences. We propose two methods for optimizing the objective, one based on non-smooth optimization and the other based on smooth optimization. The numerical evaluation on synthetic and real-world data demonstrates the benefits of learning the feature representations for multiple change-point estimation compared to using fixed, pre-defined feature representations. Finally, Chapter 5 proposes a change-point estimation method for data consisting of sequences of point clouds. We connect the method to the concept of distances between probability distributions and show how to scale up the approach when there are thousands of point clouds, each with thousands of points. This work is motivated by an oceanographic application in which flow cytometry point cloud data on phytoplankton is collected underway during research cruises. We illustrate the utility of the proposed method on a flow cytometry dataset and the potential to estimate the number of change points using auxiliary data.
- Statistics