Cross-Cloud Plots: Scalable Tools for Spatial and Multidimensional Data Mining

Traina, Agma; Traina, Caetano; Faloutsos, Christos; Papadimitriou, Spiros

doi:10.1184/R1/6604571.v1

file.pdf (423.97 kB)

Cross-Cloud Plots: Scalable Tools for Spatial and Multidimensional Data Mining

journal contribution

posted on 1978-01-01, 00:00 authored by Agma Traina, Caetano Traina, Christos Faloutsos, Spiros Papadimitriou

We focus on the problem of finding patterns across two large, multidimensional datasets. For example, given feature vectors of healthy and of non-healthy patients, we want to answer the following questions: “Are the two clouds of points separable?”, “What is the smallest/largest pair-wise distance across the two datasets?”, “Which of the two clouds does a new point (feature vector) come from?”. We propose a new tool, the ‘Cross-Cloud plot’, which helps us answer the above questions, and many more. We present an algorithm to compute the Cross-Cloud plot, which requires only a single pass over the datasets, thus scaling up to arbitrarily large databases. More importantly, it scales linearly with the dimensionality, while most other spatial data mining algorithms explode exponentially. We show how to use our tool for classification, when traditional methods (nearest neighbor, classification trees) may fail. We also provide a set of rules on how to interpret a Cross-cloud plot, and we apply these rules on multiple, synthetic and real datasets.

History

Publisher Statement

Date

1978-01-01

Usage metrics

Keywords

computer sciences

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Cross-Cloud Plots: Scalable Tools for Spatial and Multidimensional Data Mining

History

Publisher Statement

Date

Usage metrics

Categories

Keywords

Licence

Exports