Relevance assessments are a key component for test collectionbased evaluation of information retrieval systems. This paper reports on a feature of such collections that is used as a form of ground truth data to allow analysis of human assessment error. A wide range of test collections are retrospectively examined to determine how accurately assessors judge the relevance of documents. Our results demonstrate a high level of inconsistency across the collections studied. The level of irregularity is shown to vary across topics, with some showing a very high level of assessment error.
History
Start page
1063
End page
1072
Total pages
10
Outlet
Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval
Editors
Wei-Ying Ma, Jian-Yun Nie
Name of conference
ACM SIGIR International Conference on Research and Development in Information Retrieval