Graph Signal Processing: Structure and Scalability to Massive Data Sets
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
Large-scale networks are becoming more prevalent, with applications in healthcare systems, financial networks, social networks, and traffic systems. The detection of normal and abnormal behaviors (signals) in these systems presents a challenging problem. State-of-the-art approaches such as principal component analysis and graph signal processing address this problem using signal projections onto a space determined by an eigendecomposition or singular value decomposition. When a graph is directed, however, applying methods based on the graph Laplacian or singular value decomposition causes information from unidirectional edges to be lost. Here we present a novel formulation and graph signal processing framework that addresses this issue and that is well suited for application to extremely large, directed, sparse networks. In this thesis, we develop and demonstrate a graph Fourier transform for which the spectral components are the Jordan subspaces of the adjacency matrix. In addition to admitting a generalized Parseval’s identity, this transform yields graph equivalence classes that can simplify the computation of the graph Fourier transform over certain networks. Exploration of these equivalence classes provides the intuition for an inexact graph Fourier transform method that dramatically reduces computation time over real-world networks with nontrivial Jordan subspaces. We apply our inexact method to four years of New York City taxi trajectories (61 GB after preprocessing) over the NYC road network (6,400 nodes, 14,000 directed edges). We discuss optimization strategies that reduce the computation time of taxi trajectories from raw data by orders of magnitude: from 3,000 days to less than one day. Our method yields a fine-grained analysis that pinpoints the same locations as the original method while reducing computation time and decreasing energy dispersal among spectral components. This capability to rapidly reduce raw traffic data to meaningful features has important ramifications for city planning and emergency vehicle routing.