An Analysis of Gender Bias in K-12 Assigned Literature Through Comparison of Non-Contextual Word Embedding Models

Mohan, Preeti

An Analysis of Gender Bias in K-12 Assigned Literature Through Comparison of Non-Contextual Word Embedding Models

dc.contributor.advisor	Bender, Emily M
dc.contributor.author	Mohan, Preeti
dc.date.accessioned	2021-03-19T22:56:07Z
dc.date.available	2021-03-19T22:56:07Z
dc.date.issued	2021-03-19
dc.date.submitted	2021
dc.description	Thesis (Master's)--University of Washington, 2021
dc.description.abstract	Word embeddings are mathematical representations of words computed from a group of texts that a machine learning model is trained on. Generally, words that are similar to each othersemantically will be closer together in the vector-space created by the embedding model. The distance between words can be analyzed to understand what words tend to be used in the same contexts in a given group of texts. In this thesis, I use three different non-contextual methods of training word embedding models, Word2Vec (Mikolov et al., 2013), FastText (Bojanowski et al., 2017), and GloVe (Pennington et al., 2014), on a corpus of literature assigned to students in grades K-12 in the United States to answer three questions:- It has been shown that children are particularly prone to internalize biases in thecontent they read and watch (Railsback, 1993; Jacobs, 2003; Slater, 2003). What biases are present in literature assigned to children in grades K-12 in the United States? - Are different kinds of non-contextual word embeddings sensitive to bias in different ways? -Is the text from one book enough to detect bias using non-contextual word embedding models? I find that GloVe embeddings are more sensitive to biases in smaller corpora, while Word2Vec and FastText are more sensitive to biases in large corpora. When looking at the word embeddings from a single book, I see variations in the strength of the words that are the “most gendered” — a book that had stronger gender biases (determined through literary critiques) had words that were more strongly gendered than a book that subverted gender biases (also determined through literary critique).
dc.embargo.terms	Open Access
dc.format.mimetype	application/pdf
dc.identifier.other	Mohan_washington_0250O_22483.pdf
dc.identifier.uri	http://hdl.handle.net/1773/46827
dc.language.iso	en_US
dc.rights	CC BY
dc.subject	Children's Literature
dc.subject	Literary Bias
dc.subject	Machine Learning Bias
dc.subject	Natural Language Processing
dc.subject	Social Bias
dc.subject	Word Embeddings
dc.subject	Social research
dc.subject	Computer science
dc.subject	Linguistics
dc.subject.other	Linguistics
dc.title	An Analysis of Gender Bias in K-12 Assigned Literature Through Comparison of Non-Contextual Word Embedding Models
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Mohan_washington_0250O_22483.pdf
Size:: 1.58 MB
Format:: Adobe Portable Document Format

Download

Collections

Linguistics