##plugins.themes.bootstrap3.article.main##

Mohammad Andri Budiman
Jonson Manurung

Abstract

Phishing remains one of the most critical and rapidly evolving cyber threats, with increasing incidents that challenge conventional detection mechanisms such as blacklist-based approaches. Although machine learning models have improved phishing detection accuracy, many studies emphasize performance optimization without adequately addressing model interpretability and transparent decision-making. This study aims to develop an optimized and explainable phishing detection framework by integrating XGBoost with Particle Swarm Optimization (PSO) for hyperparameter tuning and SHAP for interpretability analysis. The proposed approach was evaluated on the UCI Phishing Websites dataset consisting of 11,055 samples and 30 features, using accuracy, precision, recall, F1-score, and ROC-AUC as performance metrics. Experimental results show that XGBoost optimized using PSO achieved the best performance with an accuracy of 0.911, precision of 0.906, recall of 0.902, F1-score of 0.904, and ROC-AUC of 0.935, outperforming Random Forest (accuracy 0.896; ROC-AUC 0.921), SVM (accuracy 0.872; ROC-AUC 0.903), and XGBoost with default hyperparameters (accuracy 0.842; ROC-AUC 0.875). Furthermore, SHAP analysis identified key influential features such as Have_IP and URL_Length, providing transparent insights into model decisions. These findings demonstrate that combining metaheuristic optimization with explainable AI significantly enhances both predictive performance and interpretability, contributing to the development of reliable and trustworthy phishing detection systems in dynamic cybersecurity environments.

##plugins.themes.bootstrap3.article.details##

How to Cite
Budiman, M. A., & Manurung, J. (2026). Enhancing XGBoost performance for classification tasks using particle swarm optimization and SHAP-based model interpretability. International Journal of Basic and Applied Science, 14(4), 162–174. https://doi.org/10.35335/ijobas.v14i4.771
References
[1] Z. Alkhalil, C. Hewage, L. Nawaf, and I. Khan, “Phishing Attacks: A Recent Comprehensive Study and a New Anatomy,” Front. Comput. Sci., vol. 3, p. 563060, 2021, doi: 10.3389/fcomp.2021.563060.
[2] M. S. Kheruddin et al., “Phishing Attacks: Unraveling Tactics, Threats, and Defenses in the Cybersecurity Landscape,” Authorea Prepr., 2024, [Online]. Available: https://www.authorea.com/users/713944/articles/698221-phishing-attacks-unraveling-tactics-threats-and-defenses-in-the-cybersecurity-landscape
[3] A. K. Jain, S. R. Sahoo, and J. Kaubiyal, “Online social networks security and privacy: comprehensive review and analysis,” Complex Intell. Syst., vol. 7, no. 5, pp. 2157–2177, 2021.
[4] Y. Liu et al., “Identifying, Collecting, and Monitoring Personally Identifiable Information: From the Dark Web to the Surface Web,” in Proceedings - 2020 IEEE International Conference on Intelligence and Security Informatics, ISI 2020, 2020, pp. 1–6. doi: 10.1109/ISI49825.2020.9280540.
[5] K. Obaideen et al., “On the contribution of solar energy to sustainable developments goals: Case study on Mohammed bin Rashid Al Maktoum Solar Park,” Int. J. Thermofluids, vol. 12, p. 100123, 2021, doi: 10.1016/j.ijft.2021.100123.
[6] H. Kabetta, R. N. Yasa, and O. G. Nabila, “Implementasi deep learning menggunakan kombinasi fitur teks dan gambar untuk mendeteksi website phishing,” 2023, [Online]. Available: https://kc3.poltekssn.ac.id/opac/index.php?p=show_detail&id=11765&keywords=
[7] O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, “Machine learning based phishing detection from URLs,” Expert Syst. Appl., vol. 117, pp. 345–357, 2019, doi: 10.1016/j.eswa.2018.09.029.
[8] R. Zieni, L. Massari, and M. C. Calzarossa, “Phishing or Not Phishing? A Survey on the Detection of Phishing Websites,” IEEE Access, vol. 11, pp. 18499–18519, 2023, doi: 10.1109/ACCESS.2023.3247135.
[9] N. Q. Do, A. Selamat, O. Krejcar, E. Herrera-Viedma, and H. Fujita, “Deep Learning for Phishing Detection: Taxonomy, Current Challenges and Future Directions,” IEEE Access, vol. 10, pp. 36429–36463, 2022, doi: 10.1109/ACCESS.2022.3151903.
[10] D. Dasgupta, Z. Akhtar, and S. Sen, “Machine learning in cybersecurity: a comprehensive survey,” J. Def. Model. Simul., vol. 19, no. 1, pp. 57–106, 2022, doi: 10.1177/1548512920951275.
[11] W. Lo, H. Alqahtani, K. Thakur, A. Almadhor, S. Chander, and G. Kumar, “A hybrid deep learning based intrusion detection system using spatial-temporal representation of in-vehicle network traffic,” Veh. Commun., vol. 35, p. 100471, 2022, doi: 10.1016/j.vehcom.2022.100471.
[12] F. El Husseini, H. Noura, O. Salman, and A. Chehab, “Advanced Machine Learning Approaches for Zero-Day Attack Detection: A Review,” in Proceedings of the 8th Cyber Security in Networking Conference: AI for Cybersecurity, CSNet 2024, 2024, pp. 297–304. doi: 10.1109/CSNet64211.2024.10851751.
[13] K. Tai Chui, “Building Digital Trust: Challenges and Strategies in Cybersecurity,” Cyber Secur. Insights, vol. 05, pp. 1–4, 2022.
[14] A. Pigola and F. de Souza Meirelles, “Unraveling trust management in cybersecurity: insights from a systematic literature review,” Inf. Technol. Manag., pp. 1–23, 2024, doi: 10.1007/s10799-024-00438-x.
[15] A. Tezel, E. Papadonikolaki, I. Yitmen, and M. Bolpagni, “Blockchain Opportunities and Issues in the Built Environment: Perspectives on Trust, Transparency and Cybersecurity,” in Structural Integrity, vol. 20, Springer, 2022, pp. 569–588. doi: 10.1007/978-3-030-82430-3_24.
[16] V. Balatska, I. Opirskyy, and N. Slobodian, “Blockchain for enhancing transparency and trust in government registries,” in CEUR Workshop Proceedings, 2024, vol. 3826, pp. 50–59.
[17] M. Li, H. Sun, Y. Huang, and H. Chen, “Shapley value: from cooperative game to explainable artificial intelligence,” Auton. Intell. Syst., vol. 4, no. 1, p. 2, 2024, doi: 10.1007/s43684-023-00060-8.
[18] A. Heuillet, F. Couthouis, and N. Diaz-Rodriguez, “Collective eXplainable AI: Explaining Cooperative Strategies and Agent Contribution in Multiagent Reinforcement Learning with Shapley Values,” IEEE Comput. Intell. Mag., vol. 17, no. 1, pp. 59–71, 2022, doi: 10.1109/MCI.2021.3129959.
[19] L. Merrick and A. Taly, “The Explanation Game: Explaining Machine Learning Models Using Shapley Values,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020, vol. 12279 LNCS, pp. 17–38. doi: 10.1007/978-3-030-57321-8_2.
[20] C. Efremov, T. T. Le, P. Paramasivam, K. Rudzki, S. M. Osman, and T. H. Chau, “Improving syngas yield and quality from biomass/coal co-gasification using cooperative game theory and local interpretable model-agnostic explanations,” Int. J. Hydrogen Energy, vol. 96, pp. 892–907, 2024, doi: 10.1016/j.ijhydene.2024.11.329.
[21] I. E. Kumar, S. Venkatasubramanian, C. Scheidegger, and S. A. Friedler, “Problems with Shapley-value-based explanations as feature importance measures,” in 37th International Conference on Machine Learning, ICML 2020, 2020, vol. PartF16814, pp. 5447–5456.
[22] Y. A. Ali, E. M. Awwad, M. Al-Razgan, and A. Maarouf, “Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity,” Processes, vol. 11, no. 2, p. 349, 2023, doi: 10.3390/pr11020349.
[23] H. Alibrahim and S. A. Ludwig, “Hyperparameter Optimization: Comparing Genetic Algorithm against Grid Search and Bayesian Optimization,” in 2021 IEEE Congress on Evolutionary Computation, CEC 2021 - Proceedings, 2021, pp. 1551–1559. doi: 10.1109/CEC45853.2021.9504761.
[24] T. Yu and H. Zhu, “Hyper-Parameter Optimization: A Review of Algorithms and Applications,” arXiv Prepr. arXiv2003.05689, 2020, [Online]. Available: http://arxiv.org/abs/2003.05689
[25] S. Darvishpoor, A. Darvishpour, M. Escarcega, and M. Hassanalian, “Nature-Inspired Algorithms from Oceans to Space: A Comprehensive Review of Heuristic and Meta-Heuristic Optimization Algorithms and Their Potential Applications in Drones,” Drones, vol. 7, no. 7, p. 427, 2023, doi: 10.3390/drones7070427.
[26] B. Chen, L. Cao, C. Chen, Y. Chen, and Y. Yue, “A comprehensive survey on the chicken swarm optimization algorithm and its applications: state-of-the-art and research challenges,” Artif. Intell. Rev., vol. 57, no. 7, p. 170, 2024, doi: 10.1007/s10462-024-10786-3.
[27] T. R. Alsenani, S. I. Ayon, S. M. Yousuf, F. B. K. Anik, and M. E. S. Chowdhury, “Intelligent feature selection model based on particle swarm optimization to detect phishing websites,” Multimed. Tools Appl., vol. 82, no. 29, pp. 44943–44975, 2023, doi: 10.1007/s11042-023-15399-6.
[28] P. Pathak and A. K. Shrivas, “Development of Proposed Model Using Random Forest with Optimization Technique for Classification of Phishing Website,” SN Comput. Sci., vol. 5, no. 8, p. 1059, 2024, doi: 10.1007/s42979-024-03388-x.
[29] N. K. Y. Gurukala and D. K. Verma, “Feature Selection Using Particle Swarm Optimization and Ensemble-Based Machine Learning Models for Ransomware Detection,” SN Comput. Sci., vol. 5, no. 8, p. 1093, 2024, doi: 10.1007/s42979-024-03454-4.
[30] W. Hu, Q. Cao, M. Darbandi, and N. Jafari Navimipour, “A deep analysis of nature-inspired and meta-heuristic algorithms for designing intrusion detection systems in cloud/edge and IoT: state-of-the-art techniques, challenges, and future directions,” Cluster Comput., vol. 27, no. 7, pp. 8789–8815, 2024, doi: 10.1007/s10586-024-04385-8.