Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages

Wang, Shunjie

Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages

Files

Wang_washington_0250O_23376.pdf (2.54 MB)

Date

2021-10-29

relationships.isAuthorOf

Wang, Shunjie

Abstract

Transformer models perform well on NLP tasks, but recent theoretical studies suggest their ability in modeling certain regular and context-free languages are limited. This creates a disparity given their success in modeling natural language strings, which are hypothesized to be mildly context-sensitive. We complement previous works on transformers and formal languages by relating them to mildly context-sensitive grammar formalisms with varying degrees of weak generative capacity. We test simple vanilla transformer models' ability to learn copying, crossing, and multiple agreements languages, and find that they generalize well to unseen in-domain data and have comparable performance to LSTMs, and learn highly interpretable self-attention patterns. However, such transformers cannot consistently recognize strings from the languages that are longer than the ones seen during training, and are often outperformed by LSTMs in this setting. We present initial evidence that suggests this is due to the limitation of the vanilla sinusoidal positional encoding.