Linguistics
Browse by
Recent Submissions
-
An Investigation Into Supervision for Seq2Seq Techniques for Natural Language to Code Translation
This thesis examines the role of supervised data using small-scale datasets for the natural language to code task. The primary angles of inquiry are from analyzing the balance between unsupervised learning and supervised ... -
Sociolinguistic and Phonetic Perception of Second Language Mandarin Chinese
Perception of second language (L2) speakers and their speech is known to be influenced both by phonetic and by sociolinguistic factors. The existing body of scholarly research on L2 speech perception, however, is overwhelmingly ... -
Simplifying Multimodal Emotion Recognition with Single Eye Movement Modality
Multimodal emotion recognition has long been a popular topic in affective computing since it significantly enhances the performance compared with that of a single modality. Among all, the combination of electroencephalography ... -
Automatically Inferring Grammar Specifications for Adnominal Possession from Interlinear Glossed Text
This thesis presents an update to the AGGREGATION grammar inference project: namely, the ability to automatically infer information about adnominal possession for a given lan- guage. Specifically, I contribute code that ... -
Ethnic History and Language Typology in Western China: The Cases of Xining, Daohua and Bai
The following dissertation examines the language history of areas historically lying along the China-Tibet frontier, namely Amdo, Kham and the Dali region of northwest Yunnan. It draws from a wide and diverse literature ... -
Modals in Natural Language Optimize the Simplicity/Informativeness Trade-Off
The meanings expressed by the world’s languages have been argued to support efficient communication. Evidence for this hypothesis has drawn on cross-linguistic analyses of vocabulary in semantic domains of both content ... -
"Obama never said that": Evaluating fact-checks for topical consistency and quality
This thesis examines topical consistency between claims and fact-checks in the Birdwatch dataset published by Twitter. The dataset has tweets (the claims), notes (context-adding annotations written by Birdwatch users), and ... -
Resourceful at Any Size: A Predictive Methodology Using Linguistic Corpus Metrics for Multi-Source Training in Neural Dependency Parsing
Multilingual modeling comes up in natural language processing at any scale. High-resource language corpora train high-performing models, and can be combined with other language corpora of all sizes to make better models ... -
Comparing Methods for Automatic Identification of Mislabeled Data
This thesis compares three methods for identifying mislabeled examples in datasets: Dataset Cartography (Swayamdipta et al. [2020]), Cleanlab, (Northcutt et al. [2021b]), and Ensem- bling (Brodley and Friedl [1999], Reiss ... -
Considerations for the social impact of natural language processing
Natural language processing (NLP) technologies have transformed how people access information and communicate with one another. It has thus become critical to take stock of the social impact of natural language processing ... -
Latent Compositional Representations for English Function Word Comprehension
This paper investigates whether biasing natural language models toward tree-compositional structure and systematic token representation can improve performance on tasks that require the use of function words. The method ... -
The Spatiality of Perceptual Dialectology
A criticism that has been leveled against modern sociolinguistic research is that “space [has been] carefully controlled out of” studies and that "spatial variation [... is] not examined" (Britain 2010b, p. 3). This ... -
Classifying COVID-19 News on Sina Weibo
This thesis addresses the classification of Sina Weibo news related to the COVID-19 pandemic by using sentiment analysis. The design is a comparison study, involving four different systems. The systems were chosen after ... -
Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages
Transformer models perform well on NLP tasks, but recent theoretical studies suggest their ability in modeling certain regular and context-free languages are limited. This creates a disparity given their success in modeling ... -
Extracting and Inferring Personal Attributes from Dialogue
Personal attributes represent structured information about a person, such as their hobbies, pets, family, likes and dislikes. In this work, we introduce the tasks of extracting and inferring personal attributes from ... -
Challenges in Automated Debiasing for Toxic Language Detection
Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently introduced debiasing methods ... -
Improving Turkish Spelling Correction with Wikipedia Edit History Data
Spelling correction is a well-established NLP application, but the quality for English spelling correction tends to be significantly higher than for other languages. One significant issue for minority languages in NLP is ... -
Semantic Universals in Bayesian Learning of Quantifiers
Languages undoubtedly exhibit many surface differences; However, past works such as Goddard and Wierzbicka [1994] and von Fintel and Matthewson [2008] have identified semantic properties that are evident in a vast number ... -
Exploring Applications of Rootedness in Sociolinguistic Research in Southern Oregon
The present dissertation discusses the importance of rootedness, defined as orientation towards place, and how it factors into sociolinguistic studies. Although rootedness is not a new concept in sociolinguistics, it has ... -
Toward the Emergence of Quantifiers
This thesis explores factors influencing the emergence of quantifiers in a signaling game. It includes a few novel contributions. First is a new signaling game, called the Quantifier Game, designed to provide a setting ...