Show simple item record

dc.contributor.advisorTeredesai, Ankur Men_US
dc.contributor.authorBindra, Ashishen_US
dc.date.accessioned2012-08-10T20:48:13Z
dc.date.available2013-08-11T11:05:14Z
dc.date.issued2012-08-10
dc.date.submitted2012en_US
dc.identifier.otherBindra_washington_0250O_10121.pdfen_US
dc.identifier.urihttp://hdl.handle.net/1773/20301
dc.descriptionThesis (Master's)--University of Washington, 2012en_US
dc.description.abstractTopical categorization of blogs, documents or other objects that can be tagged with text, improves the experience for end users. Latent Dirichlet allocation (LDA) is a well studied algorithm that discovers latent topics from a corpus of documents so that the documents can then be assigned automatically into appropriate topics. New documents can also be classified into topics based on these latent topics. However, when the set of documents is very large and varies significantly from user to user, the task of calculating a single global LDA topic model, or an individual topic model for each and every user can become very expensive in large scale internet settings. The problem is further compounded by the need to periodically update this model to keep up with the relatively dynamic nature of data in online social networks such as Facebook, Twitter, and FriendFeed. In this work we show that the computation cost of using LDA for a large number of users connected via a social network can be reduced without compromising the quality of the LDA model by taking into account the social connections among the users in the network. Instead of a single global model based on every document in the network we propose to use a model created from messages that are authored by and received by a fixed number of most influential users. We use PageRank as the influence measure and show that this Social LDA model provides an effective model to use as it reduces the number of documents to process thereby reducing the cost of computing the LDA. Such a model can be used both for categorizing a users incoming document stream as well as finding user interest based on the user's authored documents. Further this also helps in the cold start problem where a model based on a user's own messages is insufficient to create a good LDA model.en_US
dc.format.mimetypeapplication/pdfen_US
dc.language.isoen_USen_US
dc.rightsCopyright is held by the individual authors.en_US
dc.subjectLDA; Machine Learning; Social Networks; Text Mining; Topic Modelingen_US
dc.subject.otherComputer scienceen_US
dc.subject.otherComputing and software systemsen_US
dc.titleSocialLDA:Scalable Topic Modeling in Social Networksen_US
dc.typeThesisen_US
dc.embargo.termsDelay release for 1 year -- then make Open Accessen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record