RMIT University
Browse

Efficient online index maintenance for contiguous inverted lists

journal contribution
posted on 2024-11-01, 02:22 authored by Nicholas Lester, Justin Zobel, Hugh Williams
Search engines and other text retrieval systems use high-performance inverted indexes to provide efficient text query evaluation. Algorithms for fast query evaluation and index construction are well-known, but relatively little has been published concerning update. In this paper, we experimentally evaluate the two main alternative strategies for index maintenance in the presence of insertions, with the constraint that inverted lists remain contiguous on disk for fast query evaluation. The in-place and re-merge strategies are benchmarked against the baseline of a complete re-build. Our experiments with large volumes of web data show that re-merge is the fastest approach if large buffers are available, but that even a simple implementation of in-place update is suitable when the rate of insertion is low or memory buffer size is limited. We also show that with careful design of aspects of implementation such as free-space management, in-place update can be improved by around an order of magnitude over a naive implementation.

History

Related Materials

  1. 1.
    ISSN - Is published in 03064573

Journal

Information Processing and Management

Volume

42

Start page

916

End page

933

Total pages

18

Publisher

Pergamon

Place published

Oxford

Language

English

Copyright

Copyright © 2005 Elsevier Ltd. All rights reserved.

Former Identifier

2006000169

Esploro creation date

2020-06-22

Fedora creation date

2009-02-27

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC