Efficient and Accurate Non-Metric k-NN Search with Applications to Text Matching
In this thesis we advance state-of-the-art of the non-metric k-NN search by carrying out an extensive empirical evaluation (both and intrinsic) of generic methods for k-NN search. This work contributes to establishing a collection of strong benchmarks for data sets with generic distances.
We start with intrinsic evaluations and demonstrate that non metric k-NN search is a practical and reasonably accurate tool for a wide variety of complex distances. However, somewhat surprisingly, achieving good performance does not require distance mapping/proxying via metric learning or distance symmetrization. Existing search methods can often work directly with non-metric and non-symmetric distances. They outperform the filter-and-refine approach relying on the distance symmetrization in the filtering step.
Intrinsic evaluations are complemented with extrinsic evaluations in a realistic text retrieval task. In doing so, we make a step towards replacing/complementing classic term-based retrieval with a generic k-NN search algorithm. To this end we use a similarity function that takes into account subtle term associations, which are learned from a parallel monolingual corpus. An exact brute-force k-NN search using this similarity function is quite slow. We show that an approximate search algorithm can be 100-300 times faster at the expense of only a small loss in accuracy (10%). On one data set, a retrieval pipeline using an approximate k-NN search is twice as efficient as the C++ baseline while being as accurate as the Lucene-based fusion pipeline. We note, however, that it is necessary to compare our methods against more recent ranking algorithms.
The thesis is concluded with a summary of learned lessons and open research questions (relevant to this work). We also discuss potential challenges facing a retrieval system designer.
History
Date
2018-05-17Degree Type
- Dissertation
Department
- Language Technologies Institute
Degree Name
- Doctor of Philosophy (PhD)