bhooi_MachineLearning_2019.pdf (9.42 MB)

Anomaly Detection in Graphs and Time Series: Algorithms and Applications

Download (9.42 MB)
thesis
posted on 27.06.2019, 15:59 by Bryan Hooi
How can we detect fraudsters in large online review networks, or power grid failures using electrical sensor data? With the increasing availablility of web-scale graphs and high-frequency sensor data, anomaly detection in massive datasets has seen growing focus. Social networks such as Facebook and Twitter contain up to billions of users. Similarly, large scale sensor data includes networks of traffic
speed detectors that span freeway systems in major metropolitan areas, networks of voltage sensors spanning the electrical power grid, as well as numerous types
of industrial, weather and environmental sensors. These datasets have created an increasing need for scalable algorithms that can automatically analyze this data and
flag users or events which are anomalous or of interest.
This thesis focuses on these problems, by developing scalable, principled algorithms that detect unusual behavior or events. We focus on the use of connectivity and temporal information, which allow us to detect anomalies in large graphs such as social networks, as well as for sensor datasets such as traffic data, which contain multiple sensors varying over time, that are also arranged on a graph.
First,we focus on static graphs, in which only connectivity information is present. For example, how can we detect fraudsters in a user-product review graph, or a Twitter follower-followee graph? We first propose a probabilistic approach that evaluates how surprising a dense subgraph is, while avoiding Erdos-Renyi assumptions of existing methods. We then consider how to detect dense subgraphs in a way that prevents anomalous users from evading detection by manipulating their features. Our approach improves detection accuracy by up to 70% F-measure over
comparable baselines, and detects a Twitter subgraph of more than 4000 accounts, a majority of which used follower-buying services. Next,we consider time series: for example, howcanwe detect anomalous events, such as electrical component failures in power grid time series? We develop algorithms for modelling and detecting anomalies in discrete time-series data, such as ratings, where a set of users rate a set of products; and real-valued power grid data, in which we use physics-based circuit models to accurately model and detect anomalies. Then, for mixed categorical, numeric and ordinal data, we propose an online nonparametric anomaly detection approach, that detects anomalies with
61% higher F-measure than related baselines. Finally, merging graphs and time series, we consider graphs with sensors. Consider a set of sensors arranged in a graph, each collecting data over time: for example, traffic speed sensors, which are arranged on a road network, or voltage sensors on a power grid network. We develop algorithms for detecting anomalous events or large changes happening on a subset of the graph nodes, such as traffic accidents
or power line failures. Additionally, we propose algorithms for near-optimally selecting locations for new sensors to be placed on a power grid graph, improving the detection of electrical component failures by 59% or more F-measure.

History

Date

18/04/2019

Degree Type

Dissertation

Department

Machine Learning

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Christos Faloutsos

Exports

Exports