Deep Learning with Abstention: Algorithms for Robust Training and  Predictive Uncertainty

Thulasidasan, Sunil

Deep Learning with Abstention: Algorithms for Robust Training and Predictive Uncertainty

Files

Thulasidasan_washington_0250E_21330.pdf (19.35 MB)

Date

2020-08-14

Authors

Thulasidasan, Sunil

Abstract

Machine learning using deep neural networks -- also called ``Deep Learning'' -- has been at the center of practically every major advance in artificial intelligence over the last several years, revolutionizing the fields of computer vision, speech recognition and machine translation. Spurred by these successes, there is now tremendous interest in using deep learning-based systems in all domains where computers can be used to make predictions after being trained on large quantities of data. In this thesis, we tackle two important practical challenges that arise when using deep neural networks (DNNs) for classification. \textit{First, we study the problem of how to robustly train deep models when the training data itself is unreliable.} This is a common occurrence in real-world deep learning where large quantities of data can be easily collected, but due to the enormous size of the datasets required for deep learning, perfectly labeling them is usually infeasible; when such label noise is significant, the performance of the resulting classifier can be severely affected. To tackle this, we devise a novel algorithmic framework for training deep models using an \textit{abstention} based approach. We show how such a ``deep abstaining classifier'' can improve robustness to different types of label noise, and unlike other existing approaches, also {\em learns features} that are indicative of unreliable labels. State-of-the-art performance is achieved for noise-robust learning on a number of standard benchmarks; further gains on noise-robustness are then shown by combining abstention with techniques from classical control and semi-supervised learning. \textit{The second challenge we consider is improving the predictive uncertainty of DNNs}. In many applications, and especially in high-risk ones, it is critical to have a reliable measure of confidence in the model's prediction. DNNs, however, often exhibit pathological \textit{overconfidence}, rendering standard methods of confidence-based thresholding ineffective. We first take a closer look at the roots of overconfidence in DNNs, discovering that introducing uncertainty into the training labels leads to significantly reduced overconfidence and improved probability estimates associated with predicted outcomes; using this observation, we show how threshold-based abstention can be made to work again. Then we consider the problem of open set detection -- where the classifier is presented with an instance from an unknown category -- and demonstrate how our abstention framework can be a very reliable open set detector. The contributions of this thesis are thus two-fold, and in two parts -- part one on robust training, and part two on predictive uncertainty -- but unified by a common theme: abstention. Our work demonstrates that such an approach is highly effective both while training and deploying DNNs for classification, and hence, a useful addition to a real-world deep learning pipeline.