RMIT University
Browse

Generating natural language queries for more effective ranking

Download (4.67 MB)
thesis
posted on 2024-11-24, 03:43 authored by Binsheng Liu
Queries are formal representations of information needs and play a central role in information retrieval. Recent pretrained Transformer models have largely improved our abilities of processing natural language texts, including natural language queries. The way we understand queries, the techniques we use to generate queries need to be updated accordingly. This dissertation focuses on queries, particularly query generations. Over the last 30 years researchers have been trying many ways to optimize queries. Relevance modeling is a classic technique which tries to surface more relevance information from pseudo-relevant documents. Our first attempt in query generation is incorporating field information into this technique. However, relevance models often result in long, uninterpretable queries. Our experiments also show that relevance models tend to make small improvements on web collections. We then perform an analysis on the value of rewriting queries, using automatically generated queries from search log and human-written queries from crowdsourcing. Our results show that rewriting queries can further improve retrieval effectiveness by a large amount. But, automatically generated queries are still not as good as human written queries, which motivates us to study query generation techniques by leveraging the recent progress in neural networks. Query generation is a fundamental and versatile technique that has diverse applications. It can be used to produce query variations, reformulate queries, enrich documents, and so forth. Recent transformer networks has facilitated language generation tasks. These generation models are often trained with supervised learning, but we have observed that such learning objectives do not necessarily lead to effective queries, which means the generations have high readability but are often not effective as queries. We propose a novel task SNLQ (Strong Natural Lan- guage Queries) to combine readability and effectiveness objectives. To achieve both objectives, we propose a two-step approach - supervised learning for readability followed by reinforcement learning for effectiveness. Finally, while we explore natural language query generation, the new form of queries has posed new challenges to ranking models compared to traditional keyword queries. Similar to generation models, ranking models also benefit from transformer networks. Inspired by traditional generative and discriminative approaches to ranking, we design a method to incorporate query generation (generative) into a ranking model (discriminative) so we can use a unified trans- former architecture to transfer knowledge between query generation and ranking which results in more generalized ranking models.

History

Degree Type

Doctorate by Research

Imprint Date

2021-01-01

School name

School of Computing Technologies, RMIT University

Former Identifier

9922059124701341

Open access

  • Yes