A-Tuning Ensemble Machine Learning Technique for Cerebral Stroke Prediction
Abstract
In this paper, the Random Forest (RF), Extreme Gradient Boosting (XGBoost), and light gradient-boosting machine (LightGBM) were used as machine learning (ML) algorithms for predicting the likelihood of a cerebral stroke by applying an open-access stroke prediction dataset. The stroke prediction dataset was pre-processed by handling missing values using the KNN imputer technique, eliminating outliers, applying the one-hot encoding method, and normalizing the features with different ranges of values. After data splitting, synthetic minority oversampling (SMO) was applied to balance the stroke samples and no-stroke classes. Furthermore, to fine-tune the hyper-parameters of the ML algorithm, we employed a random search technique that could achieve the best parameter values. After applying the tuning process, we stacked the parameters to a tuning ensemble RXLM that was analyzed and compared with traditional classifiers. The performance metrics after tuning the hyper-parameters achieved promising results with all ML algorithms.