Modern multi-stage retrieval systems are comprised of a candidate
generation stage followed by one or more reranking stages.
In such an architecture, the quality of the final ranked list may not
be sensitive to the quality of the initial candidate pool, especially in
terms of early precision.
This provides several opportunities to increase retrieval efficiency
without significantly sacrificing effectiveness.
In this paper, we explore a new approach to dynamically predicting
the size of an initial result set in the candidate generation stage,
which can directly affect the overall efficiency and effectiveness of
the entire system.
Previous work exploring this tradeoff has focused on global parameter
settings that apply to all queries, even though optimal settings vary
across queries.
In contrast, we propose a technique that makes a parameter
prediction to maximize efficiency within an effectiveness envelope
on a per query basis, using only static pre-retrieval features.
Experimental results show that substantial efficiency gains are
achievable.
In addition, our framework provides a versatile tool that can be used
to estimate the effectiveness-efficiency tradeoffs that are possible
before selecting and tuning algorithms to make machine-learned
predictions.
Funding
Beyond keyword search for ranked document retrieval