RMIT University
Browse

An empirical study of learning from imbalanced data

conference contribution
posted on 2024-10-31, 10:34 authored by Xiuzhen ZhangXiuzhen Zhang, Li Yuxuan
No consistent conclusions have been drawn from existing studies regarding the effectiveness of different approaches to learning from imbalanced data. In this paper we apply bias-variance analysis to study the utility of different strategies for imbalanced learning. We conduct experiments on 15 real-world imbalanced datasets of applying various re-sampling and induction bias adjustment strategies to the standard decision tree, naive bayes and ?-nearest neighbour (?-NN) learning algorithms. Our main findings include: Imbalanced class distribution is primarily a high bias problem, which partly explains why it impedes the performance of many standard learning algorithms. Compared to the re-sampling strategies, adjusting induction bias can more significantly vary the bias and variance components of classification errors. Especially the inverse distance weighting strategy can significantly reduce the variance errors for ?-NN. Based on these findings we offer practical advice on applying the resampling and induction bias adjustment strategies to improve imbalanced learning.

History

Related Materials

  1. 1.
    ISBN - Is published in 9781920682958 (urn:isbn:9781920682958)

Start page

85

End page

95

Total pages

11

Outlet

Proceedings of 22nd Australasian Database Conference (ADC 2011)

Editors

Heng Tao Shen and Yanchun Zhang

Name of conference

ADC 2011

Publisher

Australian Computer Society

Place published

Perth, Australia

Start date

2011-01-17

End date

2011-01-20

Language

English

Copyright

Copyright © 2011, Australian Computer Society

Former Identifier

2006026234

Esploro creation date

2020-06-22

Fedora creation date

2012-02-10

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC