RMIT University
Browse

Robust Ensemble Machine Learning Model for Filtering Phishing URLs: Expandable Random Gradient Stacked Voting Classifier (ERG-SVC)

journal contribution
posted on 2024-11-02, 21:09 authored by Pubudu Indrasiri, Malka N HalgamugeMalka N Halgamuge, Azeem Mohammad
As cyber-attacks grow fast and complicated, the cybersecurity industry faces challenges to utilize state-of-the-art technology and strategies to battle the consistently present malicious threats. Phishing is a sort of social engineering attack produced technically and classified as identity theft and complicated attack vectors to steal information of internet users. In this perspective, our main objective of this study is to propose a unique, robust ensemble machine learning model architecture that provides the highest prediction accuracy with a low error rate while proposing few other robust machine learning models. Both supervised and unsupervised techniques were used for the detection process. For our experiments, seven classification algorithms, one clustering algorithm, two ensemble techniques, and two large standard legitimate datasets with 73,575 URLs and 100,000 URLs were used. Two test modes (percentage split, K-Fold cross-validation) were utilized for conducting experiments and final predictions. Mechanisms were developed to (I) identify the best N, which is the optimal heuristic-based threshold value for splitting words into subwords for each classifier, (II) tune hyperparameters for each classifier to specify the best parameter combination, (III) select prominent features using various feature selection techniques, (IV) propose a robust ensemble model (classifier) called the Expandable Random Gradient Stacked Voting Classifier (ERG-SVC) utilizing a voting classifier along with a model architecture, (V) analyze possible clusters of the dataset using k-means clustering, (VI) thoroughly analyze the gradient boost classifier (GB) with respect to utilizing the 'criterion' parameter with the Mean Absolute Error (MAE), Mean Squared Error (MSE), and Friendman_MSE, and(VII) propose a lightweight preprocessor to reduce computational cost and preprocessing time. Initial experiments were carried out with 46 features; the number of features was reduced to 22 after th

History

Journal

IEEE Access

Volume

9

Start page

150142

End page

150161

Total pages

20

Publisher

IEEE

Place published

United States

Language

English

Copyright

© 2021 IEEE This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ V

Former Identifier

2006117554

Esploro creation date

2022-10-02

Usage metrics

    Scholarly Works

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC