posted on 1981-01-01, 00:00authored byChristos Faloutsos, H. V. Jagadish
Due to the skewed nature of the frequency distribution of term occurrence (e.g., Zipf's law) it is unlikely
that any single technique for indexing text can do well in all situations. In this paper we propose a hybrid
approach to indexing text, and show how it can outperform the traditional inverted B-tree index both in
storage overhead, in time to perform a retrieval, and, for dynamic databases, in time for an insertion, both
for single term and for multiple term queries. We demonstrate the benefits of our technique on a database
of stories from the Associated Press news wire, and we provide formulae and guidelines on how to make
optimal choices of the design parameters in real applications.