posted on 1996-06-01, 00:00authored byJingrui He, Jaime G. Carbonell
Rare category detection is an open challenge in machine
learning. It plays the central role in applications such
as detecting new financial fraud patterns, detecting new
network malware, and scientific discovery. In such cases rare
categories are hidden among huge volumes of normal data
and observations. In this paper, we propose a new method
for rare category detection named SEDER, which requires
no prior information about the data set. It implicitly
performs semiparametric density estimation using specially
designed exponentially families, and then picks the examples
for labeling where the neighborhood density changes the
most. SEDER can work in the cases where the data is not
separable. Its unique feature over all existing methods lies
in its prior-free nature, i.e. it does not require any prior
information about the data set (e.g. the number of classes,
the proportion of the different classes, etc.). Therefore, it is
more suitable for real applications. Experimental results on
both synthetic and real data sets demonstrate the superiority
of SEDER.