No consistent conclusions have been drawn from existing studies regarding the effectiveness of different approaches to learning from imbalanced data. In this paper we apply bias-variance analysis to study the utility of different strategies for imbalanced learning. We conduct experiments on 15 real-world imbalanced datasets of applying various re-sampling and induction bias adjustment strategies to the standard decision tree, naive bayes and ?-nearest neighbour (?-NN) learning algorithms. Our main findings include: Imbalanced class distribution is primarily a high bias problem, which partly explains why it impedes the performance of many standard learning algorithms. Compared to the re-sampling strategies, adjusting induction bias can more significantly vary the bias and variance components of classification errors. Especially the inverse distance weighting strategy can significantly reduce the variance errors for ?-NN. Based on these findings we offer practical advice on applying the resampling and induction bias adjustment strategies to improve imbalanced learning.
History
Related Materials
1.
ISBN - Is published in 9781920682958 (urn:isbn:9781920682958)
Start page
85
End page
95
Total pages
11
Outlet
Proceedings of 22nd Australasian Database Conference (ADC 2011)