posted on 2008-02-01, 00:00authored byJingrui He, Jaime G. Carbonell
Rare category detection is an open challenge for active learning, especially in
the de-novo case (no labeled examples), but of significant practical importance for
data mining - e.g. detecting new financial transaction fraud patterns, where normal
legitimate transactions dominate. This paper develops a new method for detecting
an instance of each minority class via an unsupervised local-density-differential
sampling strategy. Essentially a variable-scale nearest neighbor process is used to
optimize the probability of sampling tightly-grouped minority classes, subject to
a local smoothness assumption of the majority class. Results on both synthetic
and real data sets are very positive, detecting each minority class with only a fraction
of the actively sampled points required by random sampling and by Pelleg’s
Interleave method, the prior best technique in the sparse literature on this topic.