RMIT University
Browse

A pipelined architecture for distributed text query evaluation

journal contribution
posted on 2024-11-01, 03:11 authored by Alistair Moffat, William Webber, Justin Zobel, Ricardo Baeza-Yates
Two principal query-evaluation methodologies have been described for cluster-based implementation of distributed information retrieval systems: document partitioning and term partitioning. In a document-partitioned system, each of the processors hosts a subset of the documents in the collection, and executes every query against its local sub-collection. In a term-partitioned system, each of the processors hosts a subset of the inverted lists that make up the index of the collection, and serves them to a central machine as they are required for query evaluation. In this paper we introduce a pipelined query-evaluation methodology, based on a term-partitioned index, in which partially evaluated queries are passed amongst the set of processors that host the query terms. This arrangement retains the disk read benefits of term partitioning, but more effectively shares the computational load. We compare the three methodologies experimentally, and show that term distribution is inefficient and scales poorly. The new pipelined approach offers efficient memory utilization and efficient use of disk accesses, but suffers from problems with load balancing between nodes. Until these problems are resolved, document partitioning remains the preferred method.

History

Journal

Information Retrieval

Volume

10

Start page

205

End page

231

Total pages

27

Publisher

Springer

Place published

Dordrecht

Language

English

Copyright

© Springer Science + Business Media, LLC 2007

Former Identifier

2006005649

Esploro creation date

2020-06-22

Fedora creation date

2011-01-07

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC