RMIT University
Browse

A fast malware feature selection approach using a hybrid of multi-linear and stepwise binary logistic regression

journal contribution
posted on 2024-11-02, 01:48 authored by Shamsul Huda, Jemal Abawajy, Mali AbdollahianMali Abdollahian, Rafiqul Islam, John Yearwood
Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation anti-virus engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current anti-virus engines in detecting malware. In this paper, we propose a stepwise binary logistic regression-based dimensionality reduction techniques for malware detection using application program interface (API) call statistics. Finding the most significant malware feature using traditional wrapper-based approaches takes an exponential complexity of the dimension (m) of the dataset with a bruteforce search strategies and order of (m-1) complexity with a backward elimination filter heuristics. The novelty of the proposed approach is that it finds the worst case computational complexity which is less than order of (m-1). The proposed approach uses multi-linear regression and the p-value of each individual API feature for selection of the most uncorrelated and significant features in order to reduce the dimensionality of the large malware data and to ensure the absence of multi-collinearity. The stepwise logistic regression approach is then employed to test the significance of the individual malware feature based on their corresponding Wald statistic and to construct the binary decision the model. When the selected most significant APIs are used in a decision rule generation systems, this approach not only reduces the tree size but also improves classification performance. Exhaustive experiments on a large malware data set show that the proposed approach clearly exceeds the existing standard decision rule, support vector machine-based template approach with complete data and provides a better statistical fitness.

History

Related Materials

  1. 1.
    DOI - Is published in 10.1002/cpe.3912
  2. 2.
    ISSN - Is published in 15320634

Journal

Concurrency and Computation: Practice and Experience

Volume

29

Number

e3912

Issue

23

Start page

1

End page

18

Total pages

18

Publisher

John Wiley

Place published

United Kingdom

Language

English

Copyright

Copyright © 2016 John Wiley and Sons, Ltd.

Former Identifier

2006067045

Esploro creation date

2020-06-22

Fedora creation date

2018-09-21

Usage metrics

    Scholarly Works

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC