Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream
Topic models have proven to be a useful tool for discovering latent structures in document collections. However, most document collec- tions often come as temporal streams and thus several aspects of the latent structure such as the number of topics, the topics' dis- tribution and popularity are time-evolving. Several models exist that model the evolu- tion of some but not all of the above as- pects. In this paper we introduce infinite dynamic topic models, iDTM, that can ac- commodate the evolution of all the aforemen- tioned aspects. Our model assumes that doc- uments are organized into epochs, where the documents within each epoch are exchange- able but the order between the documents is maintained across epochs. iDTM allows for unbounded number of topics: topics can die or be born at any epoch, and the repre- sentation of each topic can evolve according to a Markovian dynamics. We use iDTM to analyze the birth and evolution of topics in the NIPS community and evaluated the effi- cacy of our model on both simulated and real datasets with favorable outcome.