Carnegie Mellon University


Reason: Author Request





until file(s) become available

High-dimensional Change Point Analysis for Categorical Data

posted on 2023-05-17, 19:22 authored by Wanshan LiWanshan Li

Statistical change point analysis is concerned with identifying abrupt changes in the data, generally observed as a time series, that are due to changes in the underlying distribution and not to random fluctuations. Though the statistical literature on change point analysis is vast, with a multitude of theories and methods, the traditional, prevailing framework for change point problems is inadequate to capture the complexity of modern high-dimensional data. Motivated by the increasing demands in applications and by the gap in theoretical and methodological research, in this thesis we develop novel procedures and theoretical results for change point detection with high-dimensional categorical data. The thesis consists of three major parts. 

First, we establish high probability bounds for localization error of change point estimators under the general setting of multiple change points in categorical responses with high-dimensional covariates. Our theory includes popular models, such as Poisson models for count response, logit models for nominal responses, and Bradley-Terry models for ranking. As an application, we study novel change point problems for high-dimensional ranking models. 

Second, we develop a novel, general, and computationally efficient framework called Divide and Conquer Dynamic Programming (DCDP) for localizing change points in time series data with highdimensional features. DCDP is applicable to a broad variety of high-dimensional statistical models and can enjoy almost linear computational complexity. We investigate the performance of DCDP in three commonly studied change point settings in high dimensions: the mean model, the Gaussian graphical model, and the linear regression model. In all three cases, we derive non-asymptotic bounds for the accuracy of the DCDP change point estimators, which match existing bounds established in the literature and are, in some cases, minimax rate optimal. 

Last, we study the change point localization problem for directed multilayer random dot product networks with latent positions drawn at each time point. We deploy a state-of-the-art tensor algorithm to provide an estimator of the expected value of the adjacency matrix that provably outperforms existing procedures. We further develop a novel online nonparametric change point detection algorithm based on kernel density estimators and bound its detection delay. Our bound depends explicitly on all the model parameters, including the number of nodes, the magnitude and time of the change, the number of layers, and the dimension of the latent position. 




Degree Type

  • Dissertation


  • Statistics and Data Science

Degree Name

  • Doctor of Philosophy (PhD)


Alessandro Rinaldo, Daren Wang

Usage metrics