Towards a Simple and Efficient Web Search Framework

Xu, Di; Callan, Jamie

doi:10.1184/R1/6473774.v1

file.pdf (133 kB)

Towards a Simple and Efficient Web Search Framework

journal contribution

posted on 2014-11-01, 00:00 authored by Di Xu, Jamie Callan

The Web Track of 2014 Text REtrieval Conference (TREC) addresses the most fundamental problem of Information Retrieval. We did not intend to craft a system that beats the state-of-the-art search engines, but to design a light weight and cost-effective system with comparable performances. We introduce a twopass retrieval framework, with the first pass consisting of a simple and efficient retrieval model that focuses on recall, and the second pass a wave of feature extraction algorithms run on the set of top ranked documents, followed by Learning to Rank (LETOR) algorithms that provide different precision oriented rankings, and their outputs are combined using data fusion. We have focused on using statistical Language Models with novel and well-known smoothing techniques, different LETOR methods, and various data fusion techniques. In addition, we have also tried using topic modelling with Hierarchical Dirichlet Allocation for query expansion in the hope of improving diversity of our results. However, the topic modelling approach has turned out to be unsuccessful, and we have not been able to spot the problem and benefit from it in this work. In addition, we also present some further analyses demonstrating that our approach is robust against overfitting, and some general studies on overfitting in the context of LETOR.

History

Date

2014-11-01

Usage metrics

Keywords

Information retrieval search engine language Model learning to rank machine learning data fusion

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Towards a Simple and Efficient Web Search Framework

History

Date

Usage metrics

Categories

Keywords

Licence

Exports