Skip to main content

Impact of Data Balancing and Feature Engineering on Accident Severity Models

Author name : FAYEZ KHALAF RAHIL ALANAZI
Publication Date : 2025-06-05
Journal Name : Promet - Traffic &Transportation

Abstract

This study investigates the impacts of feature engineering techniques, including Clustering, Target Encoding and Anomaly Detection, in conjunction with data balancing methods, on the efficacy of machine learning models for predicting road accident severity. Automated Machine Learning (AutoML), Distributed Random Forest (DRF), Boosted Regression Trees (BRT) and Deep Learning models were evaluated on datasets that were balanced using the SMOTE (Synthetic Minority Over-Sampling Technique) and ADASYN (Adaptive Synthetic Sampling) techniques. Evaluation metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Log Loss, Area under the Curve (AUC), and Area under the Precision-Recall Curve (AUCPR) are employed. Results reveal that the AutoML consistently outperforms other models, achieving an 85% accuracy in predicting fatal accidents and 94% accuracy in predicting injuries. Deep Learning excels in injury accident prediction, with a 95% accuracy, but faces challenges with fatalities, achieving a 60% accuracy. The study underscores the critical role of feature engineering techniques and data balancing methods in enhancing predictive accuracy for accident severity classification. Specifically, the incorporation of Clustering, Target Encoding and Anomaly Detection techniques alongside SMOTE and ADASYN balancing methods significantly improves the model performance. Further refinement and validation are crucial for optimising model performance in real-world traffic safety management applications.

Keywords

accident severity; traffic safety; machine learning; feature engineering; data balancing; AutoML; ADASYN

Publication Link

https://doi.org/10.7307/ptt.v37i3.856

Block_researches_list_suggestions

Suggestions to read

Rational design of new thienopyridine heterocycles tethering thiophene moiety as antimicrobial agents: Synthesis and computational biology study
MOUSA OSMAN AHMAD GERMOUSH
Generalized first approximation Matsumoto metric
AMR SOLIMAN MAHMOUD HASSAN
HIDS-IoMT: A Deep Learning-Based Intelligent Intrusion Detection System for the Internet of Medical Things
Ahlem . Harchy Ep Berguiga
Structure–Performance Relationship of Novel Azo-Salicylaldehyde Disperse Dyes: Dyeing Optimization and Theoretical Insights
EBTSAM KHALEFAH H ALENEZY
Contact