Algorithmic Approaches to Detecting Interviewer Fabrication in Surveys
MetadataShow full item record
Surveys are one of the principal means of gathering critical data from low-income regions. Bad data, however, may be no better--or worse--than no data at all. Interviewer data fabrication, one cause of bad data, is an ongoing concern of survey organizations and a constant threat to data quality. In my dissertation work, I build software that automatically identifies interviewer fabrication so that supervisors can act to reduce it. To do so, I draw on two tool sets from computer science, one algorithmic and the other technological. On the algorithmic side, I use two sets of techniques from machine learning, supervised classification and anomaly detection, to automatically identify interviewer fabrication. On the technological side, I modify data collection software running on mobile electronic devices to record user traces that can help to identify fabrication. I show, based on the results of two empirical studies, that the combination of these approaches makes it possible to accurately and robustly identify interviewer fabrication, even when interviewers are aware that the algorithms are being used, have some knowledge of how they work, and are incentivized to avoid detection.