A Framework for Understanding and Addressing Bias and Sparsity in Mobile Location-Based Traffic Data

Henrickson, Kristian

A Framework for Understanding and Addressing Bias and Sparsity in Mobile Location-Based Traffic Data

Files

Henrickson_washington_0250E_19478.pdf (2.79 MB)

Date

2019-02-22

Authors

Henrickson, Kristian

Abstract

Traffic data derived from Global Positioning System (GPS) traces of individual travelers is achieving widespread adoption in transportation engineering and planning, practice, and research. Currently, the majority of such data is obtained from commercial sources, who provide little information about the processes and quality control methods that have been applied to address informative missing data patterns and sampling bias. Looking forward to a future of connected and autonomous vehicles, when fixed mechanical sensing will likely be a thing of the past, there is a growing need to highlight this issue and develop methods to address bias in a principled way. To do this, it is necessary to understand the sampling mechanisms and their impact on missing data and bias. The goal of this work is to describe the mechanisms leading to bias, inaccuracy, and missing data in GPS-based probe vehicle data, and to quantify the impact of these mechanisms quantitatively. It is most often the case that commercial probe vehicle data is collected from multiple traveler subpopulations, each with a distinct driving profile, data collection technology, and penetration rate. Thus, this work develops a framework for estimating the impact of these factors on data completeness and bias under heterogeneous driver populations and data collection technologies. This framework is validated using microscopic traffic simulation software under a range of sampling and traffic conditions. The implications of the estimation framework are investigated with respect to real-world probe vehicle datasets and transportation applications. The primary contributions of this work are as follows. First, this work develops a mathematical framework for describing the relationship between observed data and the true on-road traffic conditions under different sampling parameters and mixed vehicle populations. Second, this work presents an in-depth analysis of the impact of sampling and traffic parameters on statistical representation of real-world probe vehicle data. Finally, a set of case studies are presented illustrating how the proposed framework can be used to improve probe vehicle data quality and fidelity, including the development of a methodology for addressing sampling bias. The methods and guidance provided in this work will be of significant value to public agencies wishing to use probe vehicle data for various forms of transportation analysis, and will inform experimental design and data acquisition agreements for future data collection efforts. Further, this work will support future work in missing data imputation and quality assessment.