RMIT University
Browse

Top-k-size keyword search on tree structured data

journal contribution
posted on 2024-11-01, 21:41 authored by Aggeliki Dimitriou, Dimitri Theodoratos, Timoleon Sellis
Keyword search is the most popular technique for querying large tree-structured datasets, often of unknown structure, in the web. Recent keyword search approaches return lowest common ancestors (LCAs) of the keyword matches ranked with respect to their relevance to the keyword query. A major challenge of a ranking approach is the efficiency of its algorithms as the number of keywords and the size and complexity of the data increase. To face this challenge most of the known approaches restrict their ranking to a subset of the LCAs (e.g., SLCAs, ELCAs), missing relevant results. In this work, we design novel top-k-size stack-based algorithms on tree-structured data. Our algorithms implement ranking semantics for keyword queries which is based on the concept of LCA size. Similar to metric selection in information retrieval, LCA size reflects the proximity of keyword matches in the data tree. This semantics does not rank a predefined subset of LCAs and through a layered presentation of results, it demonstrates improved effectiveness compared to previous relevant approaches. To address performance challenges our algorithms exploit a lattice of the partitions of the keyword set, which empowers a linear time performance. This result is obtained without the support of auxiliary precomputed data structures. An extensive experimental study on various and large datasets confirms the theoretical analysis. The results show that, in contrast to other approaches, our algorithms scale smoothly when the size of the dataset and the number of keywords increase.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1016/j.is.2014.07.002
  2. 2.
    ISSN - Is published in 03064379

Journal

Information Systems

Volume

47

Start page

178

End page

193

Total pages

16

Publisher

Elsevier Ltd

Place published

United Kingdom

Language

English

Copyright

© 2014 Elsevier Ltd. All rights reserved

Former Identifier

2006051858

Esploro creation date

2020-06-22

Fedora creation date

2015-04-20

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC