posted on 1985-01-01, 00:00authored byChengXiang Zhai, John D. Lafferty
This paper presents a probabilistic information retrieval framework in which the retrieval
problem is formally treated as a statistical decision problem. In this framework, queries
and documents are modeled using statistical language models, user preferences are modeled
through loss functions, and retrieval is cast as a risk minimization problem.We discuss
how this framework can unify existing retrieval models and accommodate systematic development
of new retrieval models. As an example of using the framework to model nontraditional
retrieval problems, we derive retrieval models for subtopic retrieval, which is
concerned with retrieving documents to cover many different subtopics of a general query
topic. These new models differ from traditional retrieval models in that they relax the traditional
assumption of independent relevance of documents.