RMIT University
Browse

On divergence measures and static index pruning

conference contribution
posted on 2024-10-31, 18:44 authored by Ruey-Cheng Chen, Chia-Jung Lee, W Bruce Croft
We study the problem of static index pruning in a renowned divergence minimization framework, using a range of divergence measures such as f-divergence and R´enyi divergence as the objective. We show that many well-known divergence measures are convex in pruning decisions, and therefore can be exactly minimized using an efficient algorithm. Our approach allows postings be prioritized according to the amount of information they contribute to the index, and through specifying a different divergence measure the contribution is modeled on a different returns curve. In our experiment on GOV2 data, R´enyi divergence of order infinity appears the most effective. This divergence measure significantly outperforms many standard methods and achieves identical retrieval effectiveness as full data using only 50% of the postings. When top-k precision is of the only concern, 10% of the data is sufficient to achieve the accuracy that one would usually expect from a full index.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1145/2808194.2809472
  2. 2.
    ISBN - Is published in 9781450338332 (urn:isbn:9781450338332)

Start page

151

End page

160

Total pages

10

Outlet

Proceedings of the 5th ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2015)

Name of conference

ICTIR 2015

Publisher

Associaion for Compuing Machinery

Place published

New York, United States

Start date

2015-09-27

End date

2015-09-30

Language

English

Copyright

Copyright © 2015 by the Association for Computing Machinery, Inc. (ACM).

Former Identifier

2006055609

Esploro creation date

2020-06-22

Fedora creation date

2015-10-28

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC