Improved Detection of Phishing Websites using Machine Learning
Abstract
Phishing attacks pose a significant threat in the cyber landscape, compromising the security of millions by exploiting trust in seemingly legitimate websites. These attacks deceive users into divulging sensitive information, posing substantial challenges to both individual and organizational security. The sophistication of phishing tactics, such as spear phishing and whaling, necessitates advanced detection methods beyond traditional rule-based systems. This paper addresses this issue by employing machine learning techniques to accurately identify and classify phishing websites. We deployed various machine learning models, including Decision Tree, Support Vector Machine (SVM), Artificial Neural Network (ANN), and Random Forest (RF), rigorously testing and evaluating their efficacy in detecting phishing attacks. The dataset used in this paper was sourced from PhishTank.org, providing a real-world context for our models. Preprocessing steps included artifact removal, normalization, and handling data inconsistencies to enhance model performance. These steps ensure that the models processed the most relevant and accurate information, improving their ability to differentiate between legitimate and malicious websites. The results of this study are promising, as the decision tree model showed the highest accuracy at 96.7%, followed by the random forest model at 95.75%. These results confirm the ability of these models to effectively detect phishing sites. The ANN model, despite the challenges of overfitting, highlighted the potential of deep learning in this area, suggesting that with further fine-tuning and regularization, it could provide more powerful detection capabilities. The SVM model's low accuracy of 83.8% was not sufficient. Instead, it provided important insights into what types of phishing strategies require different or more precise detection methods. This finding is critical for developing more targeted models in the future paper.