Smart Card Data Mining and Inference for Transit System Optimization and Performance Improvement
MetadataShow full item record
The United States energy information administration states that more that 50% of commuters drive their own cars to work. This implies that traffic congestion can be mitigated if public transit service can take a larger share of commuting trips. However, a commuter's choice depends on the utility associated with each available mode. Transit service must be improved to increase its utility and therefore attract more riders. To improve customer satisfaction and reduce operation costs, transit authorities have been striving to monitor their transit service quality and identify the key factors to attract the transit riders. Traditional manual data collection methods are unable to satisfy the transit system optimization and performance measurement requirement due to their expensive and labor-intensive nature. The recent advent of passive data collection techniques (e.g. Automated Fare Collection and Automated Vehicle Location) has shifted a data-poor environment to a data-rich environment, and offered opportunities for transit agencies to conduct comprehensive transit system performance measures. Although it is possible to collect highly valuable information from ubiquitous transit data, data usability and accessibility are still difficult to improve due to the following reasons: (1) most Automatic Fare Collection (AFC) systems are not designed for transit performance monitoring, hence additional passenger trip information cannot be directly retrieved. (2) Each passive data collection method has its intractable disadvantages, and requires additional domain knowledge to process. Interoperating and mining heterogeneous datasets would enhance both the depth and breadth of transit-related studies. (3) The amount of data involved is increasingly growing, and traditional data processing applications might not be suitable to handle in an efficient fashion. Such data barriers hinder the development of a large-scale transit performance monitoring system. This study attempts to fill these research gaps by developing a series of data mining algorithms for transit rider's origin and destination information extraction with transit Smart Card (SC) data. The primary data source of this study comes from the AFC system in Beijing, where a passenger's boarding stop (origin) and alighting stop (destination) on a flat-rate bus are not recorded on the check-in and check-out scan. A Markov chain based Bayesian decision tree algorithm is proposed to mine the passengers' origin information using SC data. In addition, this study further proposes an integrated data mining procedure that models the travel patterns and regularities of transit riders. This procedure is able to incorporate transit riders' trip chains based on their temporal and spatial characteristics, and capture their historical travel patterns in an efficient manner. Then, on the basis of the identified travel patterns, the individual-level destination can be estimated with transfer analysis through a multi-day observation. Finally, to remove data accessibility barriers, facilitate data sharing and visualization, and conduct online data analysis for transit performance measures, an e-science of transportation platform entitled TransitNet is developed. TransitNet enables the connections and interoperability among the heterogeneous transit data sets including SC data, GPS data and Geographic Information System (GIS) data. This platform not only serves as a data-rich visualization platform to monitor transit network performance for planning and operations, it also intends to take advantage of e-science developments for data-driven transportation research and applications.
- Civil engineering