An Analysis of Gender Bias in K-12 Assigned Literature Through Comparison of Non-Contextual Word Embedding Models

dc.contributor.advisorBender, Emily M
dc.contributor.authorMohan, Preeti
dc.date.accessioned2021-03-19T22:56:07Z
dc.date.available2021-03-19T22:56:07Z
dc.date.issued2021-03-19
dc.date.submitted2021
dc.descriptionThesis (Master's)--University of Washington, 2021
dc.description.abstractWord embeddings are mathematical representations of words computed from a group of texts that a machine learning model is trained on. Generally, words that are similar to each othersemantically will be closer together in the vector-space created by the embedding model. The distance between words can be analyzed to understand what words tend to be used in the same contexts in a given group of texts. In this thesis, I use three different non-contextual methods of training word embedding models, Word2Vec (Mikolov et al., 2013), FastText (Bojanowski et al., 2017), and GloVe (Pennington et al., 2014), on a corpus of literature assigned to students in grades K-12 in the United States to answer three questions:- It has been shown that children are particularly prone to internalize biases in thecontent they read and watch (Railsback, 1993; Jacobs, 2003; Slater, 2003). What biases are present in literature assigned to children in grades K-12 in the United States? - Are different kinds of non-contextual word embeddings sensitive to bias in different ways? -Is the text from one book enough to detect bias using non-contextual word embedding models? I find that GloVe embeddings are more sensitive to biases in smaller corpora, while Word2Vec and FastText are more sensitive to biases in large corpora. When looking at the word embeddings from a single book, I see variations in the strength of the words that are the “most gendered” — a book that had stronger gender biases (determined through literary critiques) had words that were more strongly gendered than a book that subverted gender biases (also determined through literary critique).
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherMohan_washington_0250O_22483.pdf
dc.identifier.urihttp://hdl.handle.net/1773/46827
dc.language.isoen_US
dc.rightsCC BY
dc.subjectChildren's Literature
dc.subjectLiterary Bias
dc.subjectMachine Learning Bias
dc.subjectNatural Language Processing
dc.subjectSocial Bias
dc.subjectWord Embeddings
dc.subjectSocial research
dc.subjectComputer science
dc.subjectLinguistics
dc.subject.otherLinguistics
dc.titleAn Analysis of Gender Bias in K-12 Assigned Literature Through Comparison of Non-Contextual Word Embedding Models
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mohan_washington_0250O_22483.pdf
Size:
1.58 MB
Format:
Adobe Portable Document Format

Collections