RMIT University
Browse

Repeatable and reliable search system evaluation using crowdsourcing

conference contribution
posted on 2024-10-31, 21:06 authored by Roi Blanco Gonzalez, Harry Halpin, Daniel Herzig, Peter Mika, Jeffrey Pound, Henry Thompson, Thanh Duc
The primary problem confronting any new kind of search task is how to boot-strap a reliable and repeatable evaluation campaign, and a crowd-sourcing approach provides many advantages. However, can these crowd-sourced evaluations be repeated over long periods of time in a reliable manner? To demonstrate, we investigate creating an evaluation campaign for the semantic search task of keyword-based ad-hoc object retrieval. In contrast to traditional search over web-pages, object search aims at the retrieval of information from factual assertions about real-world objects rather than searching over web-pages with textual descriptions. Using the first large-scale evaluation campaign that specifically targets the task of ad-hoc Web object retrieval over a number of deployed systems, we demonstrate that crowd-sourced evaluation campaigns can be repeated over time and still maintain reliable results. Furthermore, we show how these results are comparable to expert judges when ranking systems and that the results hold over different evaluation and relevance metrics. This work provides empirical support for scalable, reliable, and repeatable search system evaluation using crowdsourcing.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1145/2009916.2010039
  2. 2.
    ISBN - Is published in 9781450307574 (urn:isbn:9781450307574)

Start page

923

End page

932

Total pages

10

Outlet

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval 2011

Name of conference

SIGIR '11

Publisher

ACM

Place published

United States

Start date

2011-07-24

End date

2011-07-28

Language

English

Copyright

© 2011 ACM

Former Identifier

2006077409

Esploro creation date

2020-06-22

Fedora creation date

2017-08-28

Usage metrics

    Scholarly Works

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC