Towards Efficient and Generalizable Natural Language Processing

Peng, Hao

Towards Efficient and Generalizable Natural Language Processing

Files

Peng_washington_0250E_24834.pdf (1.56 MB)

Date

2022-09-23

Authors

Peng, Hao

Abstract

Natural language processing (NLP) is having a paradigm shift. Scaling up in terms of the sizes of models and data plays an increasingly important role. Despite the remarkable empirical gain in various well-bounded NLP tasks, the ability to generalize to complex unseen scenarios is still elusive. Further, the growing computational requirements have heightened the barriers to entry to NLP research. The inquiry of this thesis can be broadly divided into two research questions: What algorithms can integrate symbolic structures into NLP models and improve their generalization? Can we build neural architectures that are both efficient and accurate? We first focus on the synergy between modern deep learning models and classical linguistic structures. We explore a surrogate gradient method that allows for incorporating discrete structured prediction as intermediate layers in neural nets, facilitating training structured NLP pipelines end-to-end. Further, we augment neural language models with attention modules that can be trained to syntactically inform the representations or used to induce syntactic structures in an unsupervised manner. We experiment with various linguistic structures—syntactic and semantic—and apply them in real-world NLP tasks, including structured prediction, language modeling, and text classification. In the second part, we aim to improve the efficiency of state-of-the-art deep learning architectures. We discuss two efficient attention models that reduce the overhead of transformers from quadratic to linear in input lengths. In language modeling, text classification, and machine translation, they substantially improve the efficiency of state-of-the-art models without losing accuracy. Finally, we present a formal analysis of recurrent neural networks and connect them to traditional automaton methods, which offers a flexible way to devise more efficient and interpretable neural architectures imbued with desired inductive biases.