RMIT University
Browse

Efficient cascade ranking for information retrieval

Download (3.85 MB)
thesis
posted on 2024-11-25, 17:57 authored by Luke GALLAGHER
Web search services play a central role in modern society, providing access to information and knowledge. Relevance ranking for web search is a highly imbalanced problem, where non-relevant documents far out number those that are relevant to a user's particular information need. Advanced ranking methods such as Learning to Rank provide users with accurate and relevant results, albeit with greater demand for computing resources. This increases hardware and electricity costs for search providers impacting quality of service for end users. Furthermore, many Learning to Rank techniques regard the issue of reducing the computational load as a separate problem that is orthogonal to ranking result quality. This thesis investigates new techniques in cascade ranking for finding the right balance between efficiency and effectiveness in large-scale search systems. Cascade ranking employs a sequence of increasingly complex ranking models to progressively prune out less promising documents and refine the relevance ranking of those that remain. Combining cost-sensitive learning with document pruning over multiple ranking stages provides a greater degree of flexibility within the conjoint trade-off space of efficiency and effectiveness. It allows search providers to ask composite questions of the ranking models themselves. Such as what is the right balance of rank quality and query throughput for a relevance ranking model given the operational context it will be situated in? Several contributions toward cost-sensitive cascade ranking are presented. Our approach relies on the fact that earlier stages will see a full candidate list, but the majority of documents will not be relevant, or even scored in later stages. Therefore a trade-off has to be made between extracting cheap (or cost-efficient) features early on and maximizing effectiveness in the final stages of the retrieval process. More specifically, this thesis (a) collates previous work on algorithms for cost-sensitive Learning to Rank; (b) critically analyzes current methods on efficient cascade ranking; (c) presents a framework for performing cost-sensitive feature selection in the cascade ranking setting; (d) investigates the current limitations with regard to feature extraction tooling and reproducible Learning to Rank dataset construction, along with a proposed solution; (e) derives a new method for jointly optimizing a cascade of Learning to Rank models via document instance weighting that maximizes training data use for cascade learning.

History

Degree Type

Doctorate by Research

Imprint Date

2021-01-01

School name

School of Science, RMIT University

Former Identifier

9922032623901341

Open access

  • Yes

Usage metrics

    Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC