RMIT University
Browse

Quantifying test collection quality based on the consistency of relevance judgements

conference contribution
posted on 2024-10-31, 15:58 authored by Falk ScholerFalk Scholer, Andrew Turpin, Mark SandersonMark Sanderson
Relevance assessments are a key component for test collectionbased evaluation of information retrieval systems. This paper reports on a feature of such collections that is used as a form of ground truth data to allow analysis of human assessment error. A wide range of test collections are retrospectively examined to determine how accurately assessors judge the relevance of documents. Our results demonstrate a high level of inconsistency across the collections studied. The level of irregularity is shown to vary across topics, with some showing a very high level of assessment error.

History

Start page

1063

End page

1072

Total pages

10

Outlet

Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval

Editors

Wei-Ying Ma, Jian-Yun Nie

Name of conference

ACM SIGIR International Conference on Research and Development in Information Retrieval

Publisher

ACM

Place published

New York, United States

Start date

2011-07-24

End date

2011-07-28

Language

English

Copyright

© 2011 ACM

Former Identifier

2006029398

Esploro creation date

2020-06-22

Fedora creation date

2011-12-21

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC