10.1184/R1/6624692.v1
Jingrui He
Jingrui
He
Jaime G. Carbonell
Jaime G.
Carbonell
Nearest-Neighbor-Based Active Learning for Rare Category Detection
Carnegie Mellon University
2007
Software Research
2007-01-01 00:00:00
Journal contribution
https://kilthub.cmu.edu/articles/journal_contribution/Nearest-Neighbor-Based_Active_Learning_for_Rare_Category_Detection/6624692
<p>
</p><p>Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining - e.g. detecting new financial transaction fraud patterns, where normal legitimate transactions dominate. This paper develops a new method for detecting an instance of each minority class via an unsupervised local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor process is used to optimize the probability of sampling tightly-grouped minority classes, subject to a local smoothness assumption of the majority class. Results on both synthetic and real data sets are very positive, detecting each minority class with only a fraction of the actively sampled points required by random sampling and by Pelleg’s Interleave method, the prior best technique in the sparse literature on this topic. </p>
<p></p>