Obfuscated Malware Detection and Classification in Network Traffic Leveraging Hybrid Large Language Models and Synthetic Data

Author name : AMJAD FALEH JALAL ALSIRHANI

Publication Date : 2025-01-13

Journal Name : Sensors

Abstract

Android malware detection remains a critical issue for mobile security. Cybercriminals target Android since it is the most popular smartphone operating system (OS). Malware detection, analysis, and classification have become diverse research areas. This paper presents a smart sensing model based on large language models (LLMs) for developing and classifying network traffic-based Android malware. The network traffic that constantly connects Android apps may contain harmful components that may damage these apps. However, one of the main challenges in developing smart sensing systems for malware analysis is the scarcity of traffic data due to privacy concerns. To overcome this, a two-step smart sensing model Syn-detect is proposed. The first step involves generating synthetic TCP malware traffic data with malicious content using GPT-2. These data are then preprocessed and used in the second step, which focuses on malware classification. This phase leverages a fine-tuned LLM, Bidirectional Encoder Representations from Transformers (BERT), with classification layers. BERT is responsible for tokenization, generating word embeddings, and classifying malware. The Syn-detect model was tested on two Android malware datasets: CIC-AndMal2017 and CIC-AAGM2017. The model achieved an accuracy of 99.8% on CIC-AndMal2017 and 99.3% on CIC-AAGM2017. The Matthew’s Correlation Coefficient (MCC) values for the predictions were 99% for CIC-AndMal2017 and 98% for CIC-AAGM2017. These results demonstrate the strong performance of the Syn-detect smart sensing model. Compared to the latest research in Android malware classification, the model outperformed other approaches, delivering promising results.

Keywords

smart sensing; cybersecurity; large language models; malware classification; generative AI; transfer learning

Publication Link

https://doi.org/10.3390/s25010202

Suggestions to read

2025-12-02

HIDS-IoMT: A Deep Learning-Based Intelligent Intrusion Detection System for the Internet of Medical Things

Ahlem . Harchy Ep Berguiga

2025-12-02

Generalized first approximation Matsumoto metric

AMR SOLIMAN MAHMOUD HASSAN

2025-11-04

Structure–Performance Relationship of Novel Azo-Salicylaldehyde Disperse Dyes: Dyeing Optimization and Theoretical Insights

EBTSAM KHALEFAH H ALENEZY

2025-09-01

“Synthesis and Characterization of SnO₂/α-Fe₂O₃, In₂O₃/α-Fe₂O₃, and ZnO/α-Fe₂O₃ Thin Films: Photocatalytic and Antibacterial Applications”

Asma Arfaoui

Links to official Saudi websites end with edu.sa

Government websites use the HTTPS protocol for encryption and security.

Menu

Obfuscated Malware Detection and Classification in Network Traffic Leveraging Hybrid Large Language Models and Synthetic Data

Abstract

Keywords

Publication Link

Block_researches_list_suggestions

Suggestions to read

Contact us