Understanding Challenges in the Data Pipeline for Development Data

dc.contributor.advisorAnderson, Richard
dc.contributor.authorPervaiz, Fahad
dc.date.accessioned2019-08-14T22:31:38Z
dc.date.available2019-08-14T22:31:38Z
dc.date.issued2019-08-14
dc.date.submitted2019
dc.descriptionThesis (Ph.D.)--University of Washington, 2019
dc.description.abstractThe developing world is relying more and more on data driven policies. Numerous development agencies have pushed for on-ground data collection to support the development work they pursue. Many governments have launched efforts for more frequent information gathering. Overall, the amount of data collected is tremendous, yet we face significant issues in doing useful analysis. Most of these barriers are around data cleaning and merging, and they require a data engineer to support some parts of the analysis. This thesis aims to understand the pain points of cleaning development data. It also proposes solutions that harness the thought process of a data engineer to reduce the manual workload of the tedious process of cleaning such data. To achieve these goals, two research areas are critical: (1) to discern current data usage patterns and to build a taxonomy of data cleaning in the developing world; and (2) to create algorithms to support automated data cleaning, which target selected problems including matching transliterated names. With these goals, this thesis will empower regular data users to easily do the necessary data cleaning and scrubbing for analysis.
dc.embargo.termsOpen Access
dc.format.mimetypeapplication/pdf
dc.identifier.otherPervaiz_washington_0250E_19880.pdf
dc.identifier.urihttp://hdl.handle.net/1773/44145
dc.language.isoen_US
dc.rightsCC BY
dc.subjectData Cleaning
dc.subjectData Pipeline
dc.subjectData Processing
dc.subjectComputer science
dc.subject.otherComputer science and engineering
dc.titleUnderstanding Challenges in the Data Pipeline for Development Data
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Pervaiz_washington_0250E_19880.pdf
Size:
3.22 MB
Format:
Adobe Portable Document Format