Carnegie Mellon University
Guarda_cmu_0041E_11056.pdf (8.79 MB)
Download file

Inferring demand and supply characteristics of large-scale transportation networks through multi-source system-level data

Download (8.79 MB)
posted on 2023-09-12, 19:45 authored by Pablo Guarda

This dissertation develops new algorithms that integrate some of the tools developed in the travel behavior and network modeling community to better infer large-scale transportation networks’ demand and supply characteristics. The algorithms are data-driven and are able to leverage massive amounts of spatio-temporal multi-source system-level data that is passively generated in transportation systems. The datasets include traffic speeds and traffic counts collected from probe vehicles and automatic traffic counters that have a high spatio-temporal resolution, historical records of traffic incidents, and sociodemographic information from the US Census, among others. The large size and heterogeneous structure of these datasets require enhancing existing models developed in the transportation community through modern machine-learning techniques that make model training scalable and efficient in real-world applications.

The first study extends classical bi-level formulations to estimate travelers’ utility functions with multiple attributes using system-level data. This data tends to be less subject to sampling bias than individual-level data. It is cheaper to collect and has become increasingly diverse and available. To leverage system-level data, a methodology grounded on non-linear least squares is formulated to statistically infer travelers’ utility function in the network context using traffic counts, traffic speeds, the number of traffic incidents, and sociodemographic information obtained from the US Census, among other attributes. The analysis of the mathematical properties of the optimization problem and its pseudo-convexity motivates the use of normalized gradient descent, an algorithm developed in the machine learning community that is suitable for pseudo-convex programs. More importantly, a hypothesis test framework is developed to examine the statistical properties of coefficients attached to utility terms and to perform attribute selection. Experiments on synthetic data show that the travelers’ utility function coefficients can be consistently recovered and that hypothesis tests are reliable statistics to identify which attributes are determinants of travelers’ route choices. Besides, a series of Monte-Carlo experiments showed that statistical inference is robust to noise in the Origin-Destination matrix and the traffic count measurements and to various levels of sensor coverage. The methodology is also deployed at a large scale using real-world multi-source data in Fresno, CA, collected before and during the COVID-19 outbreak.

The second study leverages computational graphs and multi-source system-level data to estimate network flow and travel behavior under recurrent traffic conditions. The model solves a single-level optimization problem consistent with stochastic user equilibrium under logit assignment (SUELOGIT) and learns time-specific O-D matrices and utility functions and network flow parameters such as link flows, path flows, and travel times. To increase the model’s representational capacity for reproducing observed link flows and travel times, the parameters of the link performance functions are assumed link-specific. More importantly, the utility function in the route choice model is enriched with (i) link-specific parameters to capture the effect of unobserved attributes on route choices and (ii) period-specific parameters weighting the observed features in the utility function to capture the heterogeneity of travelers preferences among periods of the day. Experiments on synthetic data show that the parameters of the models can be consistently recovered and that the solution of the model satisfies the SUELOGIT conditions with high accuracy. The estimation procedure is also robust to random noise in the observed traffic flow and travel time, and it requires few hyperparameter tuning. Subsequently, the algorithm is deployed at a large scale using real-world multi-source data in Fresno, CA, with hourly data collected during the morning and afternoon peak periods of October 2019. The utility function includes link-specific effects and attributes such as travel time, the standard deviation of travel time, the number of traffic incidents, and socio-demographic information obtained from the US Census. The model provides estimates for the total trips and travelers’ utility function by hour of the day and for the average values of the link performance parameters that are reasonable and informative on the demand and supply characteristics of the transportation network. 

The third study enhances the model developed in the second study with the goal of making predictions of future traffic flow and travel time and in links without historical observations of traffic flow or travel time. The model predictions also comply with some basic physical constraints of network flow, including flow conservation in adjacent links and the increasing monotonic relationship between traffic flow and travel times. The model also leverages neural networks and polynomial kernel functions to increase the model’s representational capacity to map traffic flow into travel times. This approach avoids the need to pre-specifying a class of performance function, and it takes full advantage of the flexibility of the computational graphs to embed a neural network in the model. In contrast to standard O-D estimation models, the proposed methodology incorporates a trip generation stage and models travelers’ destination choices to derive an O-D matrix. As a result, the model can leverage historical data about the number of generated trips, which is typically more accessible than historical O-D matrices. All these features set this model apart from data-driven approaches that lack model interpretability and focus on predictive accuracy only. Through a novel validation strategy, experiments on synthetic data show that the model can make accurate predictions on travel time and traffic flow in links that have no historical data. Similar results are observed when the model is deployed at a large scale using real-world multi-source data in Fresno, CA. 




Degree Type

  • Dissertation


  • Civil and Environmental Engineering

Degree Name

  • Doctor of Philosophy (PhD)


Sean Qian