Empowering Data Analysis with Program Synthesis

dc.contributor.advisorBodik, Rastislav
dc.contributor.advisorCheung, Alvin
dc.contributor.authorWang, Chenglong
dc.date.accessioned2021-10-29T16:19:58Z
dc.date.available2021-10-29T16:19:58Z
dc.date.issued2021-10-29
dc.date.submitted2021
dc.descriptionThesis (Ph.D.)--University of Washington, 2021
dc.description.abstractData manipulation and visualization support data scientists' efforts to explore and understand data throughout the exploratory analysis process. Nowadays, experienced data scientists can use programming languages like SQL and R to achieve efficient and flexible analysis, and inexperienced users can easily learn and use interactive tools to accomplish simple analysis tasks. However, the lack of tools in between interactive tools and programming systems leads to a programmability gap that prevents inexperienced users from conducting expressive analysis that only users with programming experience can achieve. To help end users traverse this gap, we apply program synthesis to build tools that can synthesize programs from examples. We first introduce Falx, a visualization by example tool that lets the user create expressive visualizations using demonstrations of how a few data points are mapped to the canvas. Falx's compositional algorithm design let it synthesize both data transformation and visualization programs directly from end-to-end demonstration. We next introduce Scythe, a SQL query synthesizer that lets the user author advanced SQL queries using input-output examples. Using a language of abstract queries, Scythe can prune families of infeasible queries to achieve synthesis efficiency. To let inexperienced users distinguish synthesized complex queries, we developed a symbolic engine to compute a distinguishing input that the two queries would return different outputs. Finally, we summarize our synthesizer building experience into a framework, Kopis, that illustrates how to build an efficient relational query synthesizer using {value-preserving abstractions}. Together, these three contributions demonstrate the value of using program synthesizers to empower future data science, and offer guidance on how to build such synthesis-powered tools efficiently for new domains.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherWang_washington_0250E_23517.pdf
dc.identifier.urihttp://hdl.handle.net/1773/47987
dc.language.isoen_US
dc.rightsCC BY-ND
dc.subjectData Analysis
dc.subjectProgram Synthesis
dc.subjectComputer science
dc.subject.otherComputer science and engineering
dc.titleEmpowering Data Analysis with Program Synthesis
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wang_washington_0250E_23517.pdf
Size:
7.44 MB
Format:
Adobe Portable Document Format