SocialLDA:Scalable Topic Modeling in Social Networks

ResearchWorks/Manakin Repository

Search ResearchWorks


Advanced Search

Browse

My Account

Statistics

Related Information

SocialLDA:Scalable Topic Modeling in Social Networks

Show full item record

Title: SocialLDA:Scalable Topic Modeling in Social Networks
Author: Bindra, Ashish
Abstract: Topical categorization of blogs, documents or other objects that can be tagged with text, improves the experience for end users. Latent Dirichlet allocation (LDA) is a well studied algorithm that discovers latent topics from a corpus of documents so that the documents can then be assigned automatically into appropriate topics. New documents can also be classified into topics based on these latent topics. However, when the set of documents is very large and varies significantly from user to user, the task of calculating a single global LDA topic model, or an individual topic model for each and every user can become very expensive in large scale internet settings. The problem is further compounded by the need to periodically update this model to keep up with the relatively dynamic nature of data in online social networks such as Facebook, Twitter, and FriendFeed. In this work we show that the computation cost of using LDA for a large number of users connected via a social network can be reduced without compromising the quality of the LDA model by taking into account the social connections among the users in the network. Instead of a single global model based on every document in the network we propose to use a model created from messages that are authored by and received by a fixed number of most influential users. We use PageRank as the influence measure and show that this Social LDA model provides an effective model to use as it reduces the number of documents to process thereby reducing the cost of computing the LDA. Such a model can be used both for categorizing a users incoming document stream as well as finding user interest based on the user's authored documents. Further this also helps in the cold start problem where a model based on a user's own messages is insufficient to create a good LDA model.
Description: Thesis (Master's)--University of Washington, 2012
URI: http://hdl.handle.net/1773/20301
Author requested restriction: Delay release for 1 year -- then make Open Access

Files in this item

Files Size Format View
Bindra_washington_0250O_10121.pdf 882.7Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record