posted on 2024-11-02, 01:46authored byYang Lei, James Bezdek, Jeffrey ChanJeffrey Chan, Vinh Nguyen, Simone Romano, James Bailey
Previously, eight popular information-theoretic based cluster validity indices have been generalized and tested for probabilistic partitions built by the expectation-maximization (EM) algorithm for the Gaussian mixture model. But the analysis was limited to probabilistic clusters and there were limited explanations for differences in the performance of the indices. In this paper, we extend the tests to partitions found by fuzzy c-Means (FCM) and provide further explanations and insights about the performance of these indices. Of the eight generalized indices, we advocate a normalized version of the soft mutual information cluster validity index (NMIsM) as the best overall choice, as it outperforms the other seven indices for both FCM and EM according to our tests on synthetic and real data. The superiority of NMIsM is most pronounced for datasets with overlapped and/or varying sized clusters. Finally, we provide a theoretical analysis which helps explain the superior performance of NMIsM compared to the other three normalizations of soft mutual information.