posted on 2005-12-01, 00:00authored byPatrick Pakyan Choi, Andrew W Moore, Jeremy Kubica
The need for time-critical analysis and understanding of the underlying group structure
from transactional data has been growing in domains such as law enforcement
and customs. Kubica et al. (2003) proposed k-groups, an algorithm based on probabilistic
generative model for discovering underlying groups in data. Even though
k-groups is reported to be signficantly faster than its predecessor GDA (Kubica et al.,
2002), k-groups is too slow and memory-intensive for large data in practice. This paper
presents XGDA, a framework for scalable and robust group discovery. Evaluation
of the performances of XGDA and k-groups shows that XGDA can handle extremely
large datasets in reasonable time and yields more robust solutions than k-groups.