Electricity Based External Similarity of Categorical Attributes

Palmer, Christopher R.; Faloutsos, Christos

doi:10.1184/R1/6605147.v1

Electricity Based External Similarity of Categorical Attributes

journal contribution

posted on 2013-06-01, 00:00 authored by Christopher R. Palmer, Christos Faloutsos

Similarity or distance measures are fundamental and critical properties for data mining tools. Categorical attributes abound in databases. The Car Make, Gender, Occupation, etc. fields in a automobile insurance database are very informative. Sadly, categorical data is not easily amenable to similarity computations. A domain expert might manually specify some or all of the similarity relationships, but this is error-prone and not feasible for attributes with large domains, nor is it useful for cross-attribute similarities, such as between Gender and Occupation. External similarity functions define a similarity between, say, Car Makes by looking at how they co-occur with the other categorical attributes. We exploit a rich duality between random walks on graphs and electrical circuits to develop REP, an external similarity function. REP is theoretically grounded while the only prior work was ad-hoc. The usefulness of REP is shown in two experiments. First, we cluster categorical attribute values showing improved inferred relationships. Second, we use REP effectively as a nearest neighbour classifier.

History

Publisher Statement

© ACM, 2013. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published at http://doi.acm.org/10.1145/2482540.2482570

Date

2013-06-01

Usage metrics

Keywords

computer sciences Information and Computing Sciences not elsewhere classified

Licence

In Copyright

Electricity Based External Similarity of Categorical Attributes

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports