Carnegie Mellon University
Browse

Explainable Mining of Graphs and Time Series: Algorithms and Applications

Download (9.41 MB)
thesis
posted on 2025-07-21, 19:57 authored by Meng-Chieh Lee
<p dir="ltr">Given a social network graph, how can we predict connections between users and determine whether they are based on shared hobbies or common friends? Given a database containing molecular graphs, how can we determine whether the graphs inhibit HIV replication based on substructures they frequently share? Similarly, in time series data from EEG recording, how can we identify seizures and explain why they are considered abnormal? Although recent machine learn ing methods have shown improved performance, many remain black-box models, making explainability challenging. This leads us to explainable artificial intelli gence (XAI), which offers valuable insights through its explanations and is more practical for deployment in real-world applications. </p><p dir="ltr">In this thesis, we focus on developing explainable machine learning methods tailored for graphs and time series. Each method we propose is either inherently explainable, or designed to automatically provide data analysis or justification for its decisions. In each part, we present effective and general algorithms, and explore a broad range of applications. </p><p dir="ltr">In the first part, we focus on node-level graph mining. We propose algorithms to analyze various types of information in graphs, e.g., the network effects of the graph structure, and the usable information in the node features. Our proposed linear methods are not only inherently interpretable and fast, but also outperform baselines in solving node classification and link prediction tasks. In node classifi cation, our method improves the accuracy by 10.3% over the second-best baseline, while being2.5 times faster. In link prediction, our method achieves an average rank 1.1, outperforming baselines on 11 out of 12 real-world datasets. In the application of graph retrieval-augmented generation, our agentic method achieves an average relative improvement of 51%. </p><p dir="ltr">In the second part, we focus on graph-level graph mining. We discover frequent substructures using the minimum description length (MDL) principle and learnable graph kernels. In graph anomaly detection, our MDL-based method is 58 times faster than the second-best baseline, while achieving 1.3 times higher average precision. In graph regression, our method with learnable graph kernels improves the meanabsolute error by 14.3%. In the application of human trafficking detection, our method detects human trafficking ads with 84% precision, while requiring only 8 hours to process 4 million documents. </p><p dir="ltr">In the third part, we focus on time series mining, with an emphasis on time series anomaly detection. Our self-supervised method effectively identifies the ground truth hyperparameters of anomalies in time series data, resulting in an av erage rank 2.2 compared to the baselines. In the application involving medical EEG signals, unlike traditional point anomaly detection methods, we focus on identify ing group anomalies that occur within a short period and exhibit similar abnormal patterns. Our method is fast and scalable, and discovers and ranks both point and group anomalies in 2 minutes for 1 million data points on a stock machine.</p>

Funding

Collaborative Research: CPS: Medium: EbeeVet: An Electronic Bee-Veterinarian Framework to Secure Honey Bees

National Institute of Food and Agriculture

Find out more...

History

Date

2025-06-05

Degree Type

  • Dissertation

Thesis Department

  • Computer Science

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Christos Faloutsos Leman Akoglu