Conjunctive Boolean queries are a fundamental operation in web search engines. These queries can be reduced to the problem of intersecting ordered sets of integers, where each set represents the documents containing one of the query terms. But there is tension between the desire to store the lists effectively, in a compressed form, and the desire to carry out intersection operations efficiently, using non-sequential processing modes. In this paper we evaluate intersection algorithms on compressed sets, comparing them to the best non-sequential array-based intersection algorithms. By adding a simple, low-cost, auxiliary index, we show that compressed storage need not hinder efficient and high-speed intersection operations.
History
Start page
137
End page
148
Total pages
12
Outlet
Proceedings of the 14th International Symposium, String Processing and Information Retrieval (SPIRE 2007)
Editors
Nivio Ziviani; Ricardo Baeza-Yates
Name of conference
String Processing and Information Retrieval (SPIRE 2007) - 14th International Symposium