RMIT University
Browse

The Potential of Learned Index Structures for Index Compression

conference contribution
posted on 2024-11-03, 12:38 authored by Harrie Oosterhuis, Shane CulpepperShane Culpepper, Maarten de Rijke
Inverted indexes are vital in providing fast key-word-based search. For every term in the document collection, a list of identifiers of documents in which the term appears is stored, along with auxiliary information such as term frequency, and position offsets. While very effective, inverted indexes have large memory requirements for web-sized collections. Recently, the concept of learned index structures was introduced, where machine learned models replace common index structures such as B-tree-indexes, hash-indexes, and bloom-filters. These learned index structures require less memory, and can be computationally much faster than their traditional counterparts. In this paper, we consider whether such models may be applied to conjunctive Boolean querying. First, we investigate how a learned model can replace document postings of an inverted index, and then evaluate the compromises such an approach might have. Second, we evaluate the potential gains that can be achieved in terms of memory requirements. Our work shows that learned models have great potential in inverted indexing, and this direction seems to be a promising area for future research.

Funding

Trajectory data processing: Spatial computing meets information retrieval

Australian Research Council

Find out more...

History

Related Materials

  1. 1.
    DOI - Is published in 10.1145/3291992.3291993
  2. 2.
    ISBN - Is published in 9781450365499 (urn:isbn:9781450365499)

Start page

1

End page

4

Total pages

4

Outlet

Proceedings of the 23rd Annual Australasian Document Computing Symposium

Name of conference

23rd Australasian Document Computing Symposium (ADCS)

Publisher

ACM

Place published

New York

Start date

2018-12-11

End date

2018-12-12

Language

English

Copyright

© 2018 Copyright held by the owner/author(s).

Former Identifier

2006090032

Esploro creation date

2020-06-22

Fedora creation date

2019-03-26

Usage metrics

    Scholarly Works

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC