Linguistics
Browse by
Recent Submissions
-
"Obama never said that": Evaluating fact-checks for topical consistency and quality
This thesis examines topical consistency between claims and fact-checks in the Birdwatch dataset published by Twitter. The dataset has tweets (the claims), notes (context-adding annotations written by Birdwatch users), and ... -
Resourceful at Any Size: A Predictive Methodology Using Linguistic Corpus Metrics for Multi-Source Training in Neural Dependency Parsing
Multilingual modeling comes up in natural language processing at any scale. High-resource language corpora train high-performing models, and can be combined with other language corpora of all sizes to make better models ... -
Comparing Methods for Automatic Identification of Mislabeled Data
This thesis compares three methods for identifying mislabeled examples in datasets: Dataset Cartography (Swayamdipta et al. [2020]), Cleanlab, (Northcutt et al. [2021b]), and Ensem- bling (Brodley and Friedl [1999], Reiss ... -
Considerations for the social impact of natural language processing
Natural language processing (NLP) technologies have transformed how people access information and communicate with one another. It has thus become critical to take stock of the social impact of natural language processing ... -
Latent Compositional Representations for English Function Word Comprehension
This paper investigates whether biasing natural language models toward tree-compositional structure and systematic token representation can improve performance on tasks that require the use of function words. The method ... -
The Spatiality of Perceptual Dialectology
A criticism that has been leveled against modern sociolinguistic research is that “space [has been] carefully controlled out of” studies and that "spatial variation [... is] not examined" (Britain 2010b, p. 3). This ... -
Classifying COVID-19 News on Sina Weibo
This thesis addresses the classification of Sina Weibo news related to the COVID-19 pandemic by using sentiment analysis. The design is a comparison study, involving four different systems. The systems were chosen after ... -
Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages
Transformer models perform well on NLP tasks, but recent theoretical studies suggest their ability in modeling certain regular and context-free languages are limited. This creates a disparity given their success in modeling ... -
Extracting and Inferring Personal Attributes from Dialogue
Personal attributes represent structured information about a person, such as their hobbies, pets, family, likes and dislikes. In this work, we introduce the tasks of extracting and inferring personal attributes from ... -
Improving Turkish Spelling Correction with Wikipedia Edit History Data
Spelling correction is a well-established NLP application, but the quality for English spelling correction tends to be significantly higher than for other languages. One significant issue for minority languages in NLP is ... -
Challenges in Automated Debiasing for Toxic Language Detection
Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently introduced debiasing methods ... -
Semantic Universals in Bayesian Learning of Quantifiers
Languages undoubtedly exhibit many surface differences; However, past works such as Goddard and Wierzbicka [1994] and von Fintel and Matthewson [2008] have identified semantic properties that are evident in a vast number ... -
Exploring Applications of Rootedness in Sociolinguistic Research in Southern Oregon
The present dissertation discusses the importance of rootedness, defined as orientation towards place, and how it factors into sociolinguistic studies. Although rootedness is not a new concept in sociolinguistics, it has ... -
Toward the Emergence of Quantifiers
This thesis explores factors influencing the emergence of quantifiers in a signaling game. It includes a few novel contributions. First is a new signaling game, called the Quantifier Game, designed to provide a setting ... -
Assembling Syntax: Modeling Constituent Questions in a Grammar Engineering Framework
This dissertation is dedicated to a cross-linguistic account of constituent (aka wh-) questions as part of a grammar engineering toolkit, the Grammar Matrix, couched in the Head-driven Phrase Structure Grammar formalism ... -
ASR and Human Recognition Errors: Predictability and Lexical Factors
Considering the complexity of speech communication, it is unsurprising that a listener occasionally misrecognizes an utterance. However, by examining patterns across many recognition errors, researchers ... -
Tracing and Reducing Lexical Ambiguity in Automatically Inferred Grammars
While the automated creation of machine-readable grammars is a valuable resource for linguists who wish to work with these grammars for linguistic hypothesis testing, the complexity of developing a system capable of creating ... -
The Suitability of Generative Adversarial Training for BERT Natural Language Generation
This thesis presents a study that was designed to test the effect of generative adversarial network (GAN) training on the quality of natural language generation (NLG) using a pre-trained language model architecture: ... -
Dialogical Signals of Stance Taking in Spontaneous Conversation
This is one of the first computational studies to investigate dialogical aspects of stance taking in spontaneous, spoken dialogue with a focus on lexical similarities. In any dialogic inter- action, each speaker influences ... -
An Analysis of Gender Bias in K-12 Assigned Literature Through Comparison of Non-Contextual Word Embedding Models
Word embeddings are mathematical representations of words computed from a group of texts that a machine learning model is trained on. Generally, words that are similar to each othersemantically will be closer together ...