Augmenting Exploratory Data Analysis with Visualization Recommendation
MetadataShow full item record
Exploratory data analysis is one of the key activities for understanding and discovering new insights from data. As exploratory data analysis can involve both open-ended exploration and focused question answering, analysis tool should facilitate both exploration breadth and analysis depth. However, existing data exploration tools typically require manual chart specification, which can be tedious and prevent analysts from rapidly exploring different aspects of the data. Moreover, analysts may be blindsided by their own cognitive biases and prematurely fixate on specific questions or hypotheses. Without discipline and time, analysts may overlook important insights in the data, such as potentially confounding factors and data quality issues, and produce inaccurate results in their analyses. To help analyst perform rapid and systematic data exploration, this dissertation presents the design of mixed-initiative systems that complement manual chart specification with chart recommendation. To better understand the practice and challenges of exploratory data analysis, we first conduct an interview study with 18 data analysts. From the interview data, we characterize the goals, process, and challenges of exploratory data analysis. We then identify design opportunities for exploratory analysis tools. One major opportunity is facilitating rapid and systematic exploration with automation and guidance. The rest of the dissertation addresses this opportunity by contributing a stack of systems to augment exploratory analysis tools with chart recommendation. At the foundations of this stack, we introduce new formal languages for chart specification and recommendation. The Vega-Lite visualization grammar provides a formal representation for specifying and reasoning about charts. Building on Vega-Lite, the CompassQL query language combines partial chart specification with recommendation directives to provide a generalizable framework for chart recommendation via queries over the space of visualizations. Based on these foundations, we used the iterative design process to develop and study new recommendation-powered visual data exploration tools. Voyager enables data exploration via browsing of recommended charts, while allowing users to steer the recommendations by selecting data fields and transformations. Our user study, which compares Voyager with a traditional chart authoring tool, indicates the complementary benefits of manual authoring and recommendation browsing. Inspired by the study result, Voyager~2 blends manual and automated chart authoring in a single tool to facilitate rapid and systematic data exploration while preserving users' flexibility to directly author a broad range of charts. All of these systems have been released as open-source projects and adopted by both research and professional data science communities.