Carnegie Mellon University
Browse
file.pdf (551.57 kB)

Very Fast Similarity Queries on Semi-Structured Data from the Web

Download (551.57 kB)
journal contribution
posted on 2013-01-01, 00:00 authored by Bhavana Dalvi, William W. Cohen

In this paper, we propose a single low-dimensional representation for entities found in different datasets on the web. Our proposed PIC-D embeddings can represent large D-partite graphs using small number of dimensions enabling fast similarity queries. Our experiments show that this representation can be constructed in small amount of time (linear in number of dimensions). We demonstrate how it can be used for variety of similarity queries like set expansion, automatic set instance acquisition, and column classification. Our approach results in comparable precision with respect to task specific baselines and up to two orders of magnitude improvement in terms of query response time.

History

Publisher Statement

Copyright © SIAM

Date

2013-01-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC