RMIT University
Browse

Evaluating text representations for retrieval of the best group of documents

conference contribution
posted on 2024-10-31, 15:35 authored by Xiaoyong Liu, Bruce Croft
Cluster retrieval assumes that the probability of relevance of a document should depend on the relevance of other similar documents to the same query. The goal is to find the best group of documents. Many studies have examined the effectiveness of this approach, by employing different retrieval methods or clustering algorithms, but few have investigated text representations. This paper revisits the problem of retrieving the best group of documents, from the language-modeling perspective. We analyze the advantages and disadvantages of a range of representation techniques, derive features that characterize the good document groups, and experiment with a new probabilistic representation as a first step toward incorporating these features. Empirical evaluation demonstrates that the relationship between documents can be leveraged in retrieval when a good representation technique is available, and that retrieving the best group of documents can be more effective than retrieving individual documents.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1007/978-3-540-78646-7_43
  2. 2.
    ISBN - Is published in 9783540786450 (urn:isbn:9783540786450)

Start page

454

End page

462

Total pages

9

Outlet

Proceedings of the 30th European Conference on Information Retrieval (ECIR 2008)

Editors

Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven and RyenW. White

Name of conference

30th European Conference on Information Retrieval (ECIR 2008)

Publisher

Springer

Place published

Berlin, Germany

Start date

2008-03-30

End date

2008-04-03

Language

English

Copyright

© 2008 Springer-Verlag Berlin Heidelberg.

Former Identifier

2006024286

Esploro creation date

2020-06-22

Fedora creation date

2013-03-12

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC