Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages

dc.contributor.advisorSteinert-Threlkeld, Shane
dc.contributor.authorWang, Shunjie
dc.date.accessioned2021-10-29T16:22:07Z
dc.date.available2021-10-29T16:22:07Z
dc.date.issued2021-10-29
dc.date.submitted2021
dc.descriptionThesis (Master's)--University of Washington, 2021
dc.description.abstractTransformer models perform well on NLP tasks, but recent theoretical studies suggest their ability in modeling certain regular and context-free languages are limited. This creates a disparity given their success in modeling natural language strings, which are hypothesized to be mildly context-sensitive. We complement previous works on transformers and formal languages by relating them to mildly context-sensitive grammar formalisms with varying degrees of weak generative capacity. We test simple vanilla transformer models' ability to learn copying, crossing, and multiple agreements languages, and find that they generalize well to unseen in-domain data and have comparable performance to LSTMs, and learn highly interpretable self-attention patterns. However, such transformers cannot consistently recognize strings from the languages that are longer than the ones seen during training, and are often outperformed by LSTMs in this setting. We present initial evidence that suggests this is due to the limitation of the vanilla sinusoidal positional encoding.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherWang_washington_0250O_23376.pdf
dc.identifier.urihttp://hdl.handle.net/1773/48053
dc.language.isoen_US
dc.rightsnone
dc.subjectmildly context-sensitive
dc.subjecttransformer
dc.subjectLinguistics
dc.subject.otherLinguistics
dc.titleEvaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wang_washington_0250O_23376.pdf
Size:
2.54 MB
Format:
Adobe Portable Document Format

Collections