Leveraging Usage History to Enhance Database Usability

dc.contributor.advisorBalazinska, Magdalenaen_US
dc.contributor.authorKhoussainova, Nodiraen_US
dc.date.accessioned2013-02-25T18:01:31Z
dc.date.available2013-02-25T18:01:31Z
dc.date.issued2013-02-25
dc.date.submitted2012en_US
dc.descriptionThesis (Ph.D.)--University of Washington, 2012en_US
dc.description.abstractMore so than ever before, large datasets are being collected and analyzed throughout a variety of disciplines. Examples include social networking data, software logs, scientific data, web clickstreams, sensor network data, and more. As such, there are a wide range of users interacting with these large datasets, ranging from scientists, to data analysts, to sociologists, to market researchers. These users are experts in their domain and understand their data extensively, but are not database experts. Database systems are scalable and efficient, but are notoriously difficult to use. In this work, we aim to address this challenge, by leveraging usage history. From usage history, we can extract knowledge about the multitude of users' experiences with the database. Consequently, this knowledge allows us to build smarter systems that better cater to the users' needs. We address different aspects of the database usability problem and develop three complementary systems. First, we aim to ease the query formulation process. We build the SnipSuggest system, which is an autocompletion tool for SQL queries. It provides on-the-go, context-aware assistance in the query composition process. The second challenge we address is that of query debugging. Query debugging is a painful process in part because executing queries directly over a large database is slow while manually creating small test databases is burdensome to users. We present the second contribution of this dissertation: SIQ (Sample-based Interactive Querying). SIQ is a system for automatically selecting a `good' small sample of the underlying input database to allow queries to execute in realtime, thus supporting interactive query debugging. Third, once a user has successfully constructed the right query, they must execute it. However, executing and understanding the performance of a query on a large-scale, parallel database system can be difficult even for experts. Our third contribution, PerfXplain, is a tool for explaining the performance of a MapReduce job running on a shared-nothing cluster. Namely, it aims to answer the question of why one job was slower than another. PerfXplain analyzes the MapReduce log files from past runs to better understand the correlation between different properties of pairs of job and their relative runtimes.en_US
dc.embargo.termsNo embargoen_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.otherKhoussainova_washington_0250E_11006.pdfen_US
dc.identifier.urihttp://hdl.handle.net/1773/22014
dc.language.isoen_USen_US
dc.rightsCopyright is held by the individual authors.en_US
dc.subjectdatabases; queries; usabilityen_US
dc.subject.otherComputer scienceen_US
dc.subject.otherComputer science and engineeringen_US
dc.titleLeveraging Usage History to Enhance Database Usabilityen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Khoussainova_washington_0250E_11006.pdf
Size:
3.86 MB
Format:
Adobe Portable Document Format