RMIT University
Browse

Examining additivity and weak baselines

journal contribution
posted on 2024-11-02, 01:20 authored by Sadegh Kharazmi, Falk ScholerFalk Scholer, David Vallet, Mark SandersonMark Sanderson
We present a study of which baseline to use when testing a new retrieval technique. In contrast to past work, we show that measuring a statistically significant improvement over a weak baseline is not a good predictor of whether a similar improvement will be measured on a strong baseline. Sometimes strong baselines are made worse when a new technique is applied. We investigate whether conducting comparisons against a range of weaker baselines can increase confidence that an observed effect will also show improvements on a stronger baseline. Our results indicate that this is not the case - at best, testing against a range of baselines means that an experimenter can be more confident that the new technique is unlikely to significantly harm a strong baseline. Examining recent past work, we present evidence that the information retrieval (IR) community continues to test against weak baselines. This is unfortunate as, in light of our experiments, we conclude that the only way to be confident that a new technique is a contribution is to compare it against nothing less than the state of the art.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1145/2882782
  2. 2.
    ISSN - Is published in 10468188

Journal

ACM Transactions on Information Systems

Volume

34

Number

23

Issue

4

Start page

1

End page

18

Total pages

18

Publisher

Association for Computing Machinery (ACM) Special Interest Group

Place published

United States

Language

English

Copyright

© 2016 ACM

Former Identifier

2006067447

Esploro creation date

2020-06-22

Fedora creation date

2016-12-20

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC