Nearest-Neighbor-Based Active Learning for Rare Category Detection

He, Jingrui; Carbonell, Jaime G.

doi:10.1184/R1/6624692.v1

File(s) stored somewhere else

http://www.cs.cmu.edu/~jgc/publications.html

Please note: Linked content is NOT stored on Carnegie Mellon University and we can't guarantee its availability, quality, security or accept any liability.

Nearest-Neighbor-Based Active Learning for Rare Category Detection

journal contribution

posted on 2007-01-01, 00:00 authored by Jingrui He, Jaime G. Carbonell

Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining - e.g. detecting new financial transaction fraud patterns, where normal legitimate transactions dominate. This paper develops a new method for detecting an instance of each minority class via an unsupervised local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor process is used to optimize the probability of sampling tightly-grouped minority classes, subject to a local smoothness assumption of the majority class. Results on both synthetic and real data sets are very positive, detecting each minority class with only a fraction of the actively sampled points required by random sampling and by Pelleg’s Interleave method, the prior best technique in the sparse literature on this topic.

History

Date

2007-01-01

Usage metrics

Keywords

Software Research

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) stored somewhere else

Nearest-Neighbor-Based Active Learning for Rare Category Detection

History

Date

Usage metrics

Categories

Keywords

Licence

Exports