On limits of travel time predictions: Insights from a New York City case study
Abstract
The proliferation of location sensors has resulted in the wide availability of historical location and time data. A prominent use of such data is to develop models to estimate travel-times (between arbitrary points in a city) accurately. The problem of travel-time estimation/prediction has been well studied in the past, where the proposed techniques span a spectrum of statistical methods, such as k-nearest neighbors, Gaussian regression, Artificial Neural Networks, and Support Vector Machines. In this paper, we demonstrate that, contrary to popular intuition, empirical data suggests that simple travel time predictors come very close to the fundamental error bounds achievable in delay prediction. We derive such bounds by estimating entropy that remains in travel time distributions, even after all spatio-temporal delay-influencing factors have been accounted for. Our results are based on analysis of cab traces from New York City, that feature 15 million trips. While we cannot claim generalizability to other cities, the results suggest the diminishing return of complex travel-time predictors due to the inherent nature of uncertainty in trip delays. We demonstrate a simple travel-time predictor, whose error approaches the uncertainty bound. It predicts delay based only on total distance traveled and time-of-day and is close to the optimal solution.