Efficient Arabic Essay Scoring with Hybrid Models: Feature Selection, Data Optimization, and Performance Trade-offs

Author name : HISHAM KHALAF ZAYED ALLAHEM

Publication Date : 2025-07-21

Journal Name : CMC

Abstract

Automated essay scoring (AES) systems have gained significant importance in educational settings, offering a scalable, efficient, and objective method for evaluating student essays. However, developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology, diglossia, and the scarcity of annotated datasets. This paper presents a hybrid approach to Arabic AES by combining text-based, vector-based, and embedding-based similarity measures to improve essay scoring accuracy while minimizing the training data required. Using a large Arabic essay dataset categorized into thematic groups, the study conducted four experiments to evaluate the impact of feature selection, data size, and model performance. Experiment 1 established a baseline using a non-machine learning approach, selecting top-N correlated features to predict essay scores. The subsequent experiments employed 5-fold cross-validation. Experiment 2 showed that combining embedding-based, text-based, and vector-based features in a Random Forest (RF) model achieved an R2 of 88.92% and an accuracy of 83.3% within a 0.5-point tolerance. Experiment 3 further refined the feature selection process, demonstrating that 19 correlated features yielded optimal results, improving R2 to 88.95%. In Experiment 4, an optimal data efficiency training approach was introduced, where training data portions increased from 5% to 50%. The study found that using just 10% of the data achieved near-peak performance, with an R2 of 85.49%, emphasizing an effective trade-off between performance and computational costs. These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems, especially in low-resource environments, addressing linguistic challenges while ensuring efficient data usage.

Keywords

Automated essay scoring; text-based features; vector-based features; embedding-based features; feature selection; optimal data efficiency

Publication Link

https://doi.org/10.32604/cmc.2025.063189

Suggestions to read

2025-12-02

Generalized first approximation Matsumoto metric

AMR SOLIMAN MAHMOUD HASSAN

2025-12-02

HIDS-IoMT: A Deep Learning-Based Intelligent Intrusion Detection System for the Internet of Medical Things

Ahlem . Harchy Ep Berguiga

2025-11-04

Structure–Performance Relationship of Novel Azo-Salicylaldehyde Disperse Dyes: Dyeing Optimization and Theoretical Insights

EBTSAM KHALEFAH H ALENEZY

2025-10-23

Flufenamic acid-based sulfonohydrazide and acetamide derivatives NSAI as inhibitors of multi-targets COX-1/COX-2/5-LOX: design, synthesis, in silico ADMET and binding mode studies

Arafa Kassem A Musa

Official government website of the Government of the Kingdom of Saudi Arabia

Links to official Saudi websites end with edu.sa

Government websites use the HTTPS protocol for encryption and security.

Menu

Efficient Arabic Essay Scoring with Hybrid Models: Feature Selection, Data Optimization, and Performance Trade-offs

Abstract

Keywords

Publication Link

Block_researches_list_suggestions

Suggestions to read

Contact us