RMIT University
Browse

Learning concept importance using a weighted dependence model

conference contribution
posted on 2024-10-31, 15:33 authored by Michael Bendersky, Donald Metzler, Bruce Croft
Modeling query concepts through term dependencies has been shown to have a significant positive effect on retrieval performance, especially for tasks such as web search, where relevance at high ranks is particularly critical. Most previous work, however, treats all concepts as equally important, an assumption that often does not hold, especially for longer, more complex queries. In this paper, we show that one of the most effective existing term dependence models can be naturally extended by assigning weights to concepts. We demonstrate that the weighted dependence model can be trained using existing learning-to-rank techniques, even with a relatively small number of training queries. Our study compares the effectiveness of both endogenous (collection-based) and exogenous (based on external sources) features for determining concept importance. To test the weighted dependence model, we perform experiments on both publicly available TREC corpora and a proprietary web corpus. Our experimental results indicate that our model consistently and significantly outperforms both the standard bag-of-words model and the unweighted term dependence model, and that combining endogenous and exogenous features generally results in the best retrieval effectiveness.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1145/1718487.1718492
  2. 2.
    ISBN - Is published in 9781605588896 (urn:isbn:9781605588896)

Start page

31

End page

40

Total pages

10

Outlet

Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM 2010)

Editors

Brian D. Davison and Torsten Suel

Name of conference

3rd ACM International Conference on Web Search and Data Mining (WSDM 2010)

Publisher

ACM

Place published

New York, USA

Start date

2010-02-03

End date

2010-02-06

Language

English

Copyright

Copyright 2010 ACM

Former Identifier

2006024356

Esploro creation date

2020-06-22

Fedora creation date

2013-03-04

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC