Semtra: A semi-supervised approach to traffic flow labeling with minimal human effort
journal contribution
posted on 2024-11-01, 22:27authored byAdil Al-Harthi, ABDULMOHSEN AFAF M ALMALAWI, Zahir TariZahir Tari, Kurayman Alharthi, Fawaz Al Qahtani, Mohamed Cheriet
Network traffic classification is an essential component for service differentiation, network design and management and security systems. The limitations of traditional port-based and payload methods have been addressed by recent promising studies which rely on the analysis of the statistics of traffic flows and the use of machine learning techniques. However, due to the high cost of manual labeling, it is hard to obtain sufficient, reliable and up-to-date labeled data for effective IP traffic classification. This paper proposes a novel semi-supervised approach, called SemTra, which automatically alleviates the shortage of labeled flows for machine learning by exploiting the advantages of both supervised and unsupervised models. In particular, SemTra involves the following: (i) generating multi-view representations of the original data based on dimensionality reduction methods to have strong discrimination ability, (ii) incorporating the generated representations into the ensemble clustering model to provide a combined clustering output with better quality and stability, (iii) adapting the concept of self-training to iteratively utilize the few labeled data along with unlabeled within local and global viewpoints; and (iiii) obtaining the final class decision by combining the decisions of mapping strategy of clusters, the local self-training and global self-training approaches. Extensive experiments were carried out to compare the effectiveness of SemTra over representative semi-supervised methods using sixteen network traffic datasets.
Funding
Australian Research Council : http://nla.gov.au/nla.party-783431