posted on 2007-01-01, 00:00authored byGeorge Panagopoulos, Christos Faloutsos
Free text retrieval is an important problem which can significantly
benefit from a parallel architecture. Signature methods have been
proposed to answer text retrieval queries in parallel machines [Sta88,
LF92], under the assumption that the main memory is sufficient to hold
the entire signature file. We propose the use of a Parallel Bit-Sliced Signature
File method on a SIMD machine architecture when the size of the
signature file exceeds the available memory. We propose that we need not
examine all the bit slices; instead we use a partial fetch slice swapping
algorithm. This method achieves graceful performance degradation according
to the database size. We provide formulae for the optimal number
of signature slices to fetch and match with the query signature. Arithmetic
examples show that our method can handle a 128GB database
with a 2sec response time on a machine with the characteristics of the
Connection Machine.