RMIT University
Browse

Revisiting Probability Distribution Assumptions for Information Theoretic Feature Selection

conference contribution
posted on 2024-11-03, 12:51 authored by Yuan Sun, Wei Wang, Michael Kirley, Xiaodong LiXiaodong Li, Jeffrey ChanJeffrey Chan
Feature selection has been shown to be beneficial for many data mining and machine learning tasks, especially for big data analytics. Mutual Information (MI) is a well-known information-theoretic approach used to evaluate the relevance of feature subsets and class labels. However, estimating highdimensional MI poses significant challenges. Consequently, a great deal of research has focused on using low-order MI approximations or computing a lower bound on MI called Variational Information (VI). These methods often require certain assumptions made on the probability distributions of features such that these distributions are realistic yet tractable to compute. In this paper, we reveal two sets of distribution assumptions underlying many MI and VI based methods: Feature Independence Distribution and Geometric Mean Distribution. We systematically analyze their strengths and weaknesses and propose a logical extension called Arithmetic Mean Distribution, which leads to an unbiased and normalised estimation of probability densities. We conduct detailed empirical studies across a suite of 29 real-world classification problems and illustrate improved prediction accuracy of our methods based on the identification of more informative features, thus providing support for our theoretical findings.

History

Related Materials

  1. 1.
    ISBN - Is published in 9781577358350 (urn:isbn:9781577358350)
  2. 2.

Start page

5908

End page

5915

Total pages

8

Outlet

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020)

Name of conference

AAAI 2020

Publisher

Association for the Advancement of Artificial Intelligence

Place published

Palo Alto, California United States

Start date

2020-02-07

End date

2020-02-12

Language

English

Copyright

Copyright © 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Former Identifier

2006101973

Esploro creation date

2020-10-22

Usage metrics

    Scholarly Works

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC