RMIT University
Browse

Pre-indexing Pruning Strategies

conference contribution
posted on 2024-11-03, 13:49 authored by S Altin, Ricardo Baeza-Yates, Berkant Cambazoglu
We explore different techniques for pruning an inverted index in advance, that is, without building the full index. These techniques provide interesting trade-offs between index size, answer quality and query coverage. We experimentally analyze them in a large public web collection with two different query logs. The trade-offs that we find range from an index of size 4% and 35% of precision@10 to an index of size 46% and 90% of precision@10, with respect to the full index case. In both cases we cover almost 97% of the query volume. We also do a relative relevance analysis with a smaller private web collection and query log, finding that some of our techniques allow a reduction of almost 40% the index size by losing less than 2% for NDCG@10.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1007/978-3-030-59212-7_13
  2. 2.
    ISSN - Is published in 03029743

Volume

12303 LNCS

Start page

177

End page

193

Total pages

17

Outlet

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Editors

Christina Boucher, Sharma V. Thankachan

Name of conference

International Symposium on String Processing and Information Retrieval

Publisher

Springer Science and Business Media Deutschland GmbH

Place published

Germany

Start date

2020-10-13

End date

2020-10-15

Language

English

Copyright

© 2020, Springer Nature Switzerland AG.

Former Identifier

2006106362

Esploro creation date

2023-12-01

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC