RMIT University
Browse

Fast on-line index construction by geometric partitioning

conference contribution
posted on 2024-10-30, 14:45 authored by Nicholas Lester, Alistair Moffat, Justin Zobel
Inverted index structures are the mainstay of modern text retrieval systems. They can be constructed quickly using off-line merge-based methods, and provide efficient support for a variety of querying modes. In this paper we examine the task of on-line index construction -- that is, how to build an inverted index when the underlying data must be continuously queryable, and the documents must be indexed and available for search as soon they are inserted. When straightforward approaches are used, document insertions become increasingly expensive as the size of the database grows. This paper describes a mechanism based on controlled partitioning that can be adapted to suit different balances of insertion and querying operations, and is faster and scales better than previous methods. Using experiments on 100GB of web data we demonstrate the efficiency of our methods in practice, showing that they dramatically reduce the cost of on-line index construction.

History

Related Materials

  1. 1.
    ISBN - Is published in 1595931406 (urn:isbn:1595931406)

Start page

776

End page

783

Total pages

8

Outlet

Proceedings of the ACM-CIKM International Conference on Information & Knowledge Management

Editors

A. Chowdhury et al

Name of conference

International Conference on Information & Knowledge Management

Publisher

ACM

Place published

New York, USA

Start date

2005-10-31

End date

2005-11-05

Language

English

Copyright

© 2005 ACM

Former Identifier

2005001110

Esploro creation date

2020-06-22

Fedora creation date

2009-04-08

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC