Carnegie Mellon University
Browse
file.pdf (214.05 kB)

Bit-Sliced Signature Files for Very Large Text Databases on a Parallel Machine Architecture

Download (214.05 kB)
journal contribution
posted on 2007-01-01, 00:00 authored by George Panagopoulos, Christos Faloutsos
Free text retrieval is an important problem which can significantly benefit from a parallel architecture. Signature methods have been proposed to answer text retrieval queries in parallel machines [Sta88, LF92], under the assumption that the main memory is sufficient to hold the entire signature file. We propose the use of a Parallel Bit-Sliced Signature File method on a SIMD machine architecture when the size of the signature file exceeds the available memory. We propose that we need not examine all the bit slices; instead we use a partial fetch slice swapping algorithm. This method achieves graceful performance degradation according to the database size. We provide formulae for the optimal number of signature slices to fetch and match with the query signature. Arithmetic examples show that our method can handle a 128GB database with a 2sec response time on a machine with the characteristics of the Connection Machine.

History

Date

2007-01-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC