Skip to main content
 

 

 

Obfuscated Malware Detection and Classification in Network Traffic Leveraging Hybrid Large Language Models and Synthetic Data

Author name : AMJAD FALEH JALAL ALSIRHANI
Publication Date : 2025-01-13
Journal Name : Sensors

Abstract

Android malware detection remains a critical issue for mobile security. Cybercriminals target Android since it is the most popular smartphone operating system (OS). Malware detection, analysis, and classification have become diverse research areas. This paper presents a smart sensing model based on large language models (LLMs) for developing and classifying network traffic-based Android malware. The network traffic that constantly connects Android apps may contain harmful components that may damage these apps. However, one of the main challenges in developing smart sensing systems for malware analysis is the scarcity of traffic data due to privacy concerns. To overcome this, a two-step smart sensing model Syn-detect is proposed. The first step involves generating synthetic TCP malware traffic data with malicious content using GPT-2. These data are then preprocessed and used in the second step, which focuses on malware classification. This phase leverages a fine-tuned LLM, Bidirectional Encoder Representations from Transformers (BERT), with classification layers. BERT is responsible for tokenization, generating word embeddings, and classifying malware. The Syn-detect model was tested on two Android malware datasets: CIC-AndMal2017 and CIC-AAGM2017. The model achieved an accuracy of 99.8% on CIC-AndMal2017 and 99.3% on CIC-AAGM2017. The Matthew’s Correlation Coefficient (MCC) values for the predictions were 99% for CIC-AndMal2017 and 98% for CIC-AAGM2017. These results demonstrate the strong performance of the Syn-detect smart sensing model. Compared to the latest research in Android malware classification, the model outperformed other approaches, delivering promising results.

Keywords

smart sensing; cybersecurity; large language models; malware classification; generative AI; transfer learning

Publication Link

https://doi.org/10.3390/s25010202

Block_researches_list_suggestions

Suggestions to read

Oral cancer stem cells: A comprehensive review of key drivers of treatment resistance and tumor recurrence
DR KALADHAR REDDY AILENI
Modeling the Social Factors Affecting Students Satisfaction with Online Learning: A Structural Equation Modeling Approach
ABDULHAMEED RAKAN ALENEZI
Photocurrent and electrical properties of SiGe Nanocrystals grown on insulator via Solid-state dewetting of Ge/SOI for Photodetection and Solar cells Applications
MOHAMMED OMAR MOHAMMEDAHMED IBRAHIM
Comparative analysis of high-performance UF membranes with sulfonated polyaniline: Improving hydrophilicity and antifouling capabilities for water purification
EBTSAM KHALEFAH H ALENEZY
Contact