RMIT University
Browse

Graph-based term weighting for information retrieval

journal contribution
posted on 2024-11-02, 04:46 authored by Roi Blanco Gonzalez, Christina Lioma
A standard approach to Information Retrieval (IR) is to model text as a bag of words. Alternatively, text can be modelled as a graph, whose vertices represent words, and whose edges represent relations between the words, defined on the basis of any meaningful statistical or linguistic relation. Given such a text graph, graph theoretic computations can be applied to measure various properties of the graph, and hence of the text. This work explores the usefulness of such graph-based text representations for IR. Specifically, we propose a principled graph-theoretic approach of (1) computing term weights and (2) integrating discourse aspects into retrieval. Given a text graph, whose vertices denote terms linked by co-occurrence and grammatical modification, we use graph ranking computations (e.g. PageRank Page et al. in The pagerank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998) to derive weights for each vertex, i.e. term weights, which we use to rank documents against queries. We reason that our graph-based term weights do not necessarily need to be normalised by document length (unlike existing term weights) because they are already scaled by their graph-ranking computation. This is a departure from existing IR ranking functions, and we experimentally show that it performs comparably to a tuned ranking baseline, such as BM25 (Robertson et al. in NIST Special Publication 500-236: TREC-4, 1995). In addition, we integrate into ranking graph properties, such as the average path length, or clustering coefficient, which represent different aspects of the topology of the graph, and by extension of the document represented as a graph. Integrating such properties into ranking allows us to consider issues such as discourse coherence, flow and density during retrieval. We experimentally show that this type of ranking performs comparably to BM25, and can even outperform it, across different TREC (Voorhees and H

History

Related Materials

  1. 1.
    DOI - Is published in 10.1007/s10791-011-9172-x
  2. 2.
    ISSN - Is published in 13864564

Journal

Information Retrieval

Volume

15

Issue

1

Start page

54

End page

92

Total pages

39

Publisher

Springer

Place published

Netherlands

Language

English

Copyright

© Springer Science+Business Media, LLC 2011

Former Identifier

2006077266

Esploro creation date

2020-06-22

Fedora creation date

2017-08-29

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC