False-positive metrics can capture an important side of recommendation quality, focusing on the impact of suggestions that are disliked by users, as a complement of common metrics that only measure the amount of successful recommendations. In this paper we research the extent to which false-positive metrics agree or disagree with true-positive metrics in the offline evaluation of recommender systems. We discover a surprising degree of systematic disagreement that was occasionally noted but not explained in the literature by previous authors. We find an explanation for the discrepancy be-tween the metrics in the effect of popularity biases, which impact false and true-positive metrics in very different ways: instead of rewarding the recommendation of popular items, as with true-positive, false-positive metrics penalize the popular. We determine precise conditions and cases in the general trends, with a formal explanation for our findings, which we confirm and illustrate empirically in experiments with different datasets.
Funding
Multi-resolution situation recognition for urban-aware smart assistant