RMIT University
Browse

Feature selection for multiclass binary data

conference contribution
posted on 2024-10-31, 22:00 authored by Kushani Perera, Jeffrey ChanJeffrey Chan, Shanika Karunasekera
Feature selection in binary datasets is an important task in many real world machine learning applications such as document classification, genomic data analysis, and image recognition. Despite many algorithms available, selecting features that distinguish all classes from one another in a multiclass binary dataset remains a challenge. Furthermore, many existing feature selection methods incur unnecessary computation costs for binary data, as they are not specifically designed for binary data. We show that exploiting the symmetry and feature value imbalance of binary datasets, more efficient feature selection measures that can better distinguish the classes in multiclass binary datasets can be developed. Using these measures, we propose a greedy feature selection algorithm, CovSkew, for multiclass binary data. We show that CovSkew achieves high accuracy gain over baseline methods, upto ∼ 40%, especially when the selected feature subset is small. We also show that CovSkew has low computational costs compared with most of the baselines.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1007/978-3-319-93040-4_5
  2. 2.
    ISBN - Is published in 9783319930404 (urn:isbn:9783319930404)

Start page

52

End page

63

Total pages

12

Outlet

Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2018) LNAI 10939

Editors

Dinh Phung, Vincent S. Tseng, Geoffrey I. Webb, Bao Ho, Mohadeseh Ganji, Lida Rashidi

Name of conference

PAKDD 2018: Advances in Knowledge Discovery and Data Mining Part III

Publisher

Springer

Place published

Cham, Switzerland

Start date

2018-06-03

End date

2018-06-06

Language

English

Copyright

© Springer International Publishing AG, part of Springer Nature 2018

Former Identifier

2006087777

Esploro creation date

2020-06-22

Fedora creation date

2018-12-10

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC