An Efficient, Ensemble-Based Classification Framework for Big Medical Data
Abstract
Fetching useful information from big medical datasets is a complicated task in the big data age. Various classification algorithms are used in the data mining process to analyze information from the big medical dataset. Nevertheless, these classification algorithms are insufficient to handle big medical data. This work proposes an efficient, ensemble-based classification framework for big medical data to deal with this problem. The proposed work involves initially applying the preprocessing technique to remove noise, missing values, and unwanted features from big medical data. The process selects a subset of classifiers from a pool of classifiers. The selected classifiers are combined to form a hybrid system for efficient classification. The methodology further involves incremental learning from data samples, explaining the predicted outputs, and achieving high classification performance. Java is used for simulation, and the Cleveland Heart Disease big dataset and Diabetes big dataset are used for classification. The experimental result shows that the proposed ensemble algorithm provides an efficient classification compared with existing algorithms based on accuracy, precision, F-measure, recall, and execution time.