RMIT University
Browse

A novel multivariate filter method for feature selection in text classification problems

journal contribution
posted on 2024-11-02, 06:53 authored by Mahdieh Labani, Parham Moradi, Fardin Ahmadizar, Mahdi JaliliMahdi Jalili
With increasing number of documents in digital format, automatic text categorization has become a crucial task in pattern recognition problems. To ease the classification task, feature selection methods have been introduced to reduce the dimensionality of the feature space, and thus improve the classification performance. In this paper a novel filter method for feature selection, called Multivariate Relative Discrimination Criterion (MRDC), is proposed for text classification. The proposed method focuses on the reduction of redundant features using minimal-redundancy and maximal-relevancy concepts. To this end, the proposed method takes into account document frequencies for each term, while estimating their usefulness. The proposed method not only selects the features with maximum relevancy, but also the redundancy between them is takes into account using a correlation metric. MRDC does not employ any learning algorithm to evaluate the usefulness of the selected features, and thus it can be categorized as a filter method. In order to assess the effectiveness of the proposed method, several experiments are performed on three real-world datasets. The obtained results are compared to the state-of-the-art filter methods. The reported results show that in most cases MRDC results in better classification performance than others.

Funding

Inference, control and protection of interdependent spatial networked structures

Australian Research Council

Find out more...

History

Related Materials

  1. 1.
    DOI - Is published in 10.1016/j.engappai.2017.12.014
  2. 2.
    ISSN - Is published in 09521976

Journal

Engineering Applications of Artificial Intelligence

Volume

70

Start page

25

End page

37

Total pages

13

Publisher

Pergamon Press

Place published

United Kingdom

Language

English

Copyright

© 2018 Elsevier Ltd. All rights reserved.

Former Identifier

2006082298

Esploro creation date

2020-06-22

Fedora creation date

2018-09-20

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC