posted on 2024-11-24, 00:24authored byGaya Jayasinghe
Experimental uncertainty arises as a consequence of: (1) bias (systematic error), and (2) variance in measurements. Popular evaluation techniques only account for the variance due to sampling of experimental units, and assume the other sources of uncertainty can be ignored. For example, only the uncertainty due to sampling of topics (queries) and sampling of training:test datasets is considered in standard information retrieval (IR) and classifier system evaluation respectively. However, incomplete relevance judgements, assessor disagreement, non-deterministic systems, and the measurement bias can also cause uncertainty in these experiments. In this thesis, the impact of other sources of uncertainty on evaluating IR and classification experiments are investigated.
The uncertainty due to:(1) incomplete relevance judgements in IR test collections,(2) non-determinism in IR systems / classifiers, and (3) high variance of classifiers is analysed using case studies from distributed information retrieval and information security. The thesis illustrates the importance of reducing and accurately accounting for uncertainty when evaluating complex IR and classifier systems. Novel techniques to(1) reduce uncertainty due to test collection bias in IR evaluation and high classifier variance (overfitting) in detecting drive-by download attacks,(2) account for multidimensional variance due to sampling of IR systems instances from non-deterministic IR systems in addition to sampling of topics, and (3) account for repeated measurements due to non-deterministic classification algorithms are introduced.