Fast Text Access Methods for Optical and Large Magnetic Disks: Designs and Performance Comparison
journal contributionposted on 01.01.1996 by Christos Faloutsos, Raphael Chan
Any type of content formally published in an academic journal, usually following a peer-review process.
High capacity disks, especially optical ones, are commercially available. These disks are ideal for archiving large text data bases. In this work, we examine efficient searching techniques for such applications. We propose a unifying framework, which reveals the similarities between signature files and an inverted file using a hash table. Then, we design methods that combine the ease of insertion of the signature files with the fast retrieval of the inverted files. We develop analytical models for their performance and we verify it through experimentation on a 2.8 Mb data base. The agreement between theory and experimentation is very good. The results show that the proposed methods achieve fast retrieval, they require a modest 10%-30% space overhead, (as opposed to 50%- 300% overhead  for the inverted files), and they do not require re-writing; thus, they can handle insertions easily, they permit searches during an insertion and they can be used with write-once optical disks. Using our verified model, the performance predictions for the proposed methods on large data bases (e.g., 250 Mb) are very promising.