Deep Understanding of Urban Mobility from CityscapeWebcams
Deep understanding of urban mobility is of great significance for many real-world applications, such as urban traffic management and autonomous driving. This thesis develops deep learning methodologies to extract vehicle counts from streaming realtime video captured by multiple low resolution web cameras and construct maps of traffic density in a city environment; in particular, we focus on cameras installed in the Manhattan borough of NYC. The large-scale videos from these web cameras have low spatial and temporal resolution, high occlusion, large perspective, and variable environment conditions, making most existing methods to lose their efficacy. To overcome these challenges, the thesis develops several techniques: 1. a block-level regression model with a rank constraint to map the dense image feature into vehicle densities; 2. a deep multi-task learning framework based on fully convolutional neural networks to jointly learn vehicle density and vehicle count; 3. deep spatio-temporal networks for vehicle counting to incorporate temporal information of the traffic flow; and 4. multi-source domain adaptation mechanisms with adversarial learning to adapt the deep counting model to multiple cameras. To train and validate the proposed system, we have collected a largescale webcam traffic dataset CityCam that contains 60 million frames from 212 webcams installed in key intersections of NYC. Of there, 60; 000 frames have been annotated with rich information, leading to about 900; 000 annotated objects. To the best of our knowledge, it is the first and largest webcam traffic dataset with such large number of elaborate annotations. The proposed methods are integrated into the CityScapeEye system that has been extensively evaluated and compared to existing techniques on different counting tasks and datasets, with experimental results demonstrating the effectiveness and robustness of CityScapeEye.