Nearest-Neighbor-Based Active Learning for Rare Category Detection

He, Jingrui; Carbonell, Jaime G.

doi:10.1184/R1/6607640.v1

file.pdf (727.16 kB)

Nearest-Neighbor-Based Active Learning for Rare Category Detection

journal contribution

posted on 2008-02-01, 00:00 authored by Jingrui He, Jaime G. Carbonell

Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining - e.g. detecting new financial transaction fraud patterns, where normal legitimate transactions dominate. This paper develops a new method for detecting an instance of each minority class via an unsupervised local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor process is used to optimize the probability of sampling tightly-grouped minority classes, subject to a local smoothness assumption of the majority class. Results on both synthetic and real data sets are very positive, detecting each minority class with only a fraction of the actively sampled points required by random sampling and by Pelleg’s Interleave method, the prior best technique in the sparse literature on this topic.

History

Publisher Statement

Date

2008-02-01

Usage metrics

Keywords

computer sciences

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Nearest-Neighbor-Based Active Learning for Rare Category Detection

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports