SocialLDA:Scalable Topic Modeling in Social Networks

ResearchWorks/Manakin Repository

Search ResearchWorks


Advanced Search

Browse

My Account

Statistics

Related Information

SocialLDA:Scalable Topic Modeling in Social Networks

Show simple item record

dc.contributor.advisor Teredesai, Ankur M en_US
dc.contributor.author Bindra, Ashish en_US
dc.date.accessioned 2012-08-10T20:48:13Z
dc.date.available 2013-08-11T11:05:14Z
dc.date.issued 2012-08-10
dc.date.submitted 2012 en_US
dc.identifier.other Bindra_washington_0250O_10121.pdf en_US
dc.identifier.uri http://hdl.handle.net/1773/20301
dc.description Thesis (Master's)--University of Washington, 2012 en_US
dc.description.abstract Topical categorization of blogs, documents or other objects that can be tagged with text, improves the experience for end users. Latent Dirichlet allocation (LDA) is a well studied algorithm that discovers latent topics from a corpus of documents so that the documents can then be assigned automatically into appropriate topics. New documents can also be classified into topics based on these latent topics. However, when the set of documents is very large and varies significantly from user to user, the task of calculating a single global LDA topic model, or an individual topic model for each and every user can become very expensive in large scale internet settings. The problem is further compounded by the need to periodically update this model to keep up with the relatively dynamic nature of data in online social networks such as Facebook, Twitter, and FriendFeed. In this work we show that the computation cost of using LDA for a large number of users connected via a social network can be reduced without compromising the quality of the LDA model by taking into account the social connections among the users in the network. Instead of a single global model based on every document in the network we propose to use a model created from messages that are authored by and received by a fixed number of most influential users. We use PageRank as the influence measure and show that this Social LDA model provides an effective model to use as it reduces the number of documents to process thereby reducing the cost of computing the LDA. Such a model can be used both for categorizing a users incoming document stream as well as finding user interest based on the user's authored documents. Further this also helps in the cold start problem where a model based on a user's own messages is insufficient to create a good LDA model. en_US
dc.format.mimetype application/pdf en_US
dc.language.iso en_US en_US
dc.subject LDA; Machine Learning; Social Networks; Text Mining; Topic Modeling en_US
dc.subject.other Computer science en_US
dc.subject.other Computing and software systems en_US
dc.title SocialLDA:Scalable Topic Modeling in Social Networks en_US
dc.type Thesis en_US
dc.embargo.terms Delay release for 1 year -- then make Open Access en_US


Files in this item

Files Size Format View
Bindra_washington_0250O_10121.pdf 882.7Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record