10.1184/R1/6603899.v1
George Panagopoulos
George
Panagopoulos
Christos Faloutsos
Christos
Faloutsos
Bit-Sliced Signature Files for Very Large Text Databases on a Parallel Machine Architecture
Carnegie Mellon University
2007
computer sciences
2007-01-01 00:00:00
Journal contribution
https://kilthub.cmu.edu/articles/journal_contribution/Bit-Sliced_Signature_Files_for_Very_Large_Text_Databases_on_a_Parallel_Machine_Architecture/6603899
Free text retrieval is an important problem which can significantly
benefit from a parallel architecture. Signature methods have been
proposed to answer text retrieval queries in parallel machines [Sta88,
LF92], under the assumption that the main memory is sufficient to hold
the entire signature file. We propose the use of a Parallel Bit-Sliced Signature
File method on a SIMD machine architecture when the size of the
signature file exceeds the available memory. We propose that we need not
examine all the bit slices; instead we use a partial fetch slice swapping
algorithm. This method achieves graceful performance degradation according
to the database size. We provide formulae for the optimal number
of signature slices to fetch and match with the query signature. Arithmetic
examples show that our method can handle a 128GB database
with a 2sec response time on a machine with the characteristics of the
Connection Machine.