RMIT University
Browse

Accelerated Query Processing Via Similarity Score Prediction

conference contribution
posted on 2024-10-31, 21:19 authored by Matthias Petri, Alistair Moffat, Joel Mackenzie, Shane CulpepperShane Culpepper, Daniel Beck
Processing top-k bag-of-words queries is critical to many information retrieval applications, including web-scale search. In this work, we consider algorithmic properties associated with dynamic pruning mechanisms. Such algorithms maintain a score threshold (the k th highest similarity score identified so far) so that low-scoring documents can be bypassed, allowing fast top-k retrieval with no loss in effectiveness. In standard pruning algorithms the score threshold is initialized to the lowest possible value. To accelerate processing, we make use of term- and query-dependent features to predict the final value of that threshold, and then employ the predicted value right from the commencement of processing. Because of the asymmetry associated with prediction errors (if the estimated threshold is too high the query will need to be re-executed in order to assure the correct answer), the prediction process must be risk-sensitive. We explore techniques for balancing those factors, and provide detailed experimental results that show the practical usefulness of the new approach.

Funding

Learning Deep Semantics for Automatic Translation between Human Languages

Australian Research Council

Find out more...

History

Related Materials

  1. 1.
    DOI - Is published in 10.1145/3331184.3331207
  2. 2.
    ISBN - Is published in 9781450361729 (urn:isbn:9781450361729)

Start page

485

End page

494

Total pages

10

Outlet

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019)

Name of conference

SIGIR 2019: Session 5B: Efficiency, Effectiveness and Performance

Publisher

Association for Computing Machinery

Place published

New York, United States

Start date

2019-07-21

End date

2019-07-25

Language

English

Copyright

Copyright © 2019 by the Association for Computing Machinery, Inc. (ACM).

Former Identifier

2006094472

Esploro creation date

2020-06-22

Fedora creation date

2019-10-23

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC