ResearchWorks Archive

SocialLDA:Scalable Topic Modeling in Social Networks

Show simple item record

dc.contributor.advisor Teredesai, Ankur M en_US Bindra, Ashish en_US 2012-08-10T20:48:13Z 2013-08-11T11:05:14Z 2012-08-10 2012 en_US
dc.identifier.other Bindra_washington_0250O_10121.pdf en_US
dc.description Thesis (Master's)--University of Washington, 2012 en_US
dc.description.abstract Topical categorization of blogs, documents or other objects that can be tagged with text, improves the experience for end users. Latent Dirichlet allocation (LDA) is a well studied algorithm that discovers latent topics from a corpus of documents so that the documents can then be assigned automatically into appropriate topics. New documents can also be classified into topics based on these latent topics. However, when the set of documents is very large and varies significantly from user to user, the task of calculating a single global LDA topic model, or an individual topic model for each and every user can become very expensive in large scale internet settings. The problem is further compounded by the need to periodically update this model to keep up with the relatively dynamic nature of data in online social networks such as Facebook, Twitter, and FriendFeed. In this work we show that the computation cost of using LDA for a large number of users connected via a social network can be reduced without compromising the quality of the LDA model by taking into account the social connections among the users in the network. Instead of a single global model based on every document in the network we propose to use a model created from messages that are authored by and received by a fixed number of most influential users. We use PageRank as the influence measure and show that this Social LDA model provides an effective model to use as it reduces the number of documents to process thereby reducing the cost of computing the LDA. Such a model can be used both for categorizing a users incoming document stream as well as finding user interest based on the user's authored documents. Further this also helps in the cold start problem where a model based on a user's own messages is insufficient to create a good LDA model. en_US
dc.format.mimetype application/pdf en_US
dc.language.iso en_US en_US
dc.rights Copyright is held by the individual authors. en_US
dc.subject LDA; Machine Learning; Social Networks; Text Mining; Topic Modeling en_US
dc.subject.other Computer science en_US
dc.subject.other Computing and software systems en_US
dc.title SocialLDA:Scalable Topic Modeling in Social Networks en_US
dc.type Thesis en_US
dc.embargo.terms Delay release for 1 year -- then make Open Access en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search ResearchWorks

Advanced Search


My Account