RMIT University
Browse

Document compaction for efficient query biased snippet generation

conference contribution
posted on 2024-10-31, 09:32 authored by Yohannes Tsegay, Simon Puglisi, Andrew Turpin, Justin Zobel
Current web search engines return query-biased snippets for each document they list in a result set. For efficiency, search engines operating on large collections need to cache snippets for common queries, and to cache documents to allow fast generation of snippets for uncached queries. To improve the hit rate on a document cache during snippet generation, we propose and evaluate several schemes for reducing document size, hence increasing the number of documents in the cache. In particular, we argue against further improvements to document compression, and argue for schemes that prune documents based on the a priori likelihood that a sentence will be used as part of a snippet for a given document. Our experiments show that if documents are reduced to less than half their original size, 80% of snippets generated are identical to those generated from the original documents. Moreover, as the pruned, compressed surrogates are smaller, 3-4 times as many documents can be cached.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1007/978-3-642-00958-7_45
  2. 2.
    ISBN - Is published in 3642009573 (urn:isbn:3642009573)

Start page

509

End page

520

Total pages

12

Outlet

Proc. European Conference on Information Retrieval

Editors

Mohand Boughanem, Catherine Berrut, Josiane Mothe, Chantal Soule-Dupuy

Name of conference

European Conference on Information Retrieval

Publisher

Springer

Place published

Germany

Start date

2009-04-06

End date

2009-04-09

Language

English

Copyright

© Springer-Verlag Berlin Heidelberg 2009

Former Identifier

2006016507

Esploro creation date

2020-06-22

Fedora creation date

2011-12-08

Usage metrics

    Scholarly Works

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC