RMIT University
Browse

Colored range queries and document retrieval

conference contribution
posted on 2024-10-31, 10:29 authored by Travis Gagie, Gonzalo Navarro, Simon Puglisi
Colored range queries are a well-studied topic in computational geometry and database research that, in the past decade, have found exciting applications in information retrieval. In this paper we give improved time and space bounds for three important one-dimensional colored range queries - colored range listing, colored range top-k queries and colored range counting - and, thus, new bounds for various document retrieval problems on general collections of sequences. Specifically, we first describe a framework including almost all recent results on colored range listing and document listing, which suggests new combinations of data structures for these problems. For example, we give the fastest compressed data structures for colored range listing and document listing, and an efficient data structure for document listing whose size is bounded in terms of the high-order entropies of the library of documents. We then show how (approximate) colored top-k queries can be reduced to (approximate) range-mode queries on subsequences, yielding the first efficient data structure for this problem. Finally, we show how a modified wavelet tree can support colored range counting in logarithmic time and space that is succinct whenever the number of colors is superpolylogarithmic in the length of the sequence.

History

Related Materials

  1. 1.
    ISBN - Is published in 9783642163203 (urn:isbn:9783642163203)

Start page

67

End page

81

Total pages

15

Outlet

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6393 LNCS

Editors

E. Chavez and S. Lonardi

Name of conference

SPIRE 2010

Publisher

Springer

Place published

Heidelberg Berlin New York

Start date

2010-10-11

End date

2010-10-13

Language

English

Copyright

©2010 Springer-Verlag

Former Identifier

2006026036

Esploro creation date

2020-06-22

Fedora creation date

2011-10-28

Usage metrics

    Scholarly Works

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC