Hybrid Feature and Optimized Deep Learning Model Fusion for Detecting Hateful Arabic Content

Author name : ALAMEEN ELTOUM MOHAMED ABDALRAHMAN

Publication Date : 2025-07-22

Journal Name : IEEE Access

Abstract

Detecting hate speech in Arabic social media content is critical for ensuring safe, inclusive, and respectful online communication. However, this task remains challenging due to Arabic’s morphological richness, dialectal variations such as Levantine, and the scarcity of high-quality annotated data. This study proposes a comprehensive and language-aware approach to Arabic hate speech detection that integrates advanced preprocessing, targeted data augmentation, hybrid feature extraction, and deep ensemble learning. Our experiments are conducted on a Levantine Arabic tweet dataset labeled hateful or non-hateful. To address lexical variability and noise common in user-generated content, we apply a dedicated preprocessing pipeline that includes normalization, diacritic removal, and emoji filtering. To further enhance generalization and mitigate data imbalance, we employ two augmentation strategies: synonym replacement using a curated Arabic lexicon and semantic-preserving back-translation through English. We investigate lexical and contextual approaches for feature extraction, including TF-IDF vectors, contextualized AraBERT embeddings, and a hybrid combination of both. These features are input into multiple deep learning classifiers, including CNN-BiGRU, BiLSTM, and DNN architectures. To maximize predictive performance, we develop an ensemble framework that integrates these models. The final prediction is obtained through a weighted fusion of individual model outputs, where the optimal weights are selected using the Grey Wolf Optimizer (GWO), aiming to maximize classification accuracy. Experimental results demonstrate that our proposed hybrid and ensemble-based architecture achieves superior performance, with an accuracy of 83.33% and a ROC-AUC score of 89.5%, outperforming individual models and conventional baselines. These findings highlight the effectiveness of hybrid feature representations and nature-inspired optimization in enhancing Arabic hate speech detection. Our approach offers a scalable, linguistically informed solution for robust content moderation in Arabic digital spaces.

Keywords

Hate speech , Social networking (online) , Feature extraction , Natural language processing , Sentiment analysis , Ensemble learning , Deep learning , Accuracy , Speech recognition , Syntactics

Publication Link

https://doi.org/10.1109/ACCESS.2025.3591673

Suggestions to read

2025-12-02

Generalized first approximation Matsumoto metric

AMR SOLIMAN MAHMOUD HASSAN

2025-12-02

HIDS-IoMT: A Deep Learning-Based Intelligent Intrusion Detection System for the Internet of Medical Things

Ahlem . Harchy Ep Berguiga

2025-11-04

Structure–Performance Relationship of Novel Azo-Salicylaldehyde Disperse Dyes: Dyeing Optimization and Theoretical Insights

EBTSAM KHALEFAH H ALENEZY

2025-11-01

Unspoken scars: A systemic functional linguistic analysis of war trauma and its ideological representations in The Yellow Birds by Kevin Powers.

HISSAH MOHAMMED SHAWHAN ALRUWAILI

Official government website of the Government of the Kingdom of Saudi Arabia

Links to official Saudi websites end with edu.sa

Government websites use the HTTPS protocol for encryption and security.

Menu

Hybrid Feature and Optimized Deep Learning Model Fusion for Detecting Hateful Arabic Content

Abstract

Keywords

Publication Link

Block_researches_list_suggestions

Suggestions to read

Contact us