RMIT University
Browse

Load balancing for term-distributed parallel retrieval

conference contribution
posted on 2024-10-30, 16:44 authored by Alistair Moffat, William Webber, Justin Zobel
Large-scale web and text retrieval systems deal with amounts of data that greatly exceed the capacity of any single machine. To handle the necessary data volumes and query throughput rates, parallel systems are used, in which the document and index data are split across tightly-clustered distributed computing systems. The index data can be distributed either by document or by term. In this paper we examine methods for load balancing in term-distributed parallel architectures, and propose a suite of techniques for reducing net querying costs. In combination, the techniques we describe allow a 30% improvement in query throughput when tested on an eight-node parallel computer system.

History

Related Materials

  1. 1.
    ISBN - Is published in 1595933697 (urn:isbn:1595933697)

Start page

348

End page

355

Total pages

8

Outlet

Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval

Editors

S. Dumais, E.N. Efthimiadis, D. Hawking, K. Jarrelin

Name of conference

Conference on Research and Development in Information Retrieval

Publisher

Association for Computing Machinery (ACM)

Place published

USA

Language

English

Copyright

© ACM

Former Identifier

2006001979

Esploro creation date

2020-06-22

Fedora creation date

2011-06-09

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC