Optimizing Email Spam Classification Using Naïve Bayes and Principal Component Analysis
DOI:
https://doi.org/10.59934/jaiea.v4i2.803Keywords:
spam classification, knowledge discovery in database (KDD), naïve bayes, principal component analysis (PCA), dimensionality reductionAbstract
In the ever-evolving digital era, email spam filtering is an important challenge to maintain the security and comfort of email services. The Naïve Bayes algorithm is widely used for spam email classification because of its ability to manage large data, although there are still limitations in terms of accuracy, precision and recall. This research aims to improve spam email classification performance by combining Naïve Bayes and Principal Component Analysis (PCA) to optimize model accuracy and explore optimal parameters in the reduction dimension. The research methodology goes through the Knowledge Discovery in Database (KDD) stages which include selection, preprocessing, transformation using PCA, development of a classification model using Naïve Bayes, and evaluation of model performance. The dataset used consists of emails categorized as spam and non-spam. The experimental results show that the combination of Naïve Bayes and PCA achieves the highest accuracy of 99.24% with 7 principal components. The fixed number of components approach shows better performance compared to preserving variance, emphasizing the importance of selecting appropriate PCA parameters in improving the effectiveness of model classification. This research shows that PCA not only reduces the complexity of the dataset but also increases the efficiency of the classification algorithm.
Downloads
References
I. AbdulNabi and Q. Yaseen, “Spam email detection using deep learning techniques,” Procedia Comput. Sci., vol. 184, no. 2019, pp. 853–858, 2021, doi: 10.1016/j.procs.2021.03.107.
A. Karim, S. Azam, B. Shanmugam, K. Kannoorpatti, and M. Alazab, “A comprehensive survey for intelligent spam email detection,” IEEE Access, vol. 7, pp. 168261–168295, 2019, doi: 10.1109/ACCESS.2019.2954791.
D. A. Anggraini, M. Ikhsan, and S. Suhardi, “Implementation of the Naïve Bayes Algorithm in the SMS Spam Filtering System,” J. Comput. Networks, Archit. High Perform. Comput., vol. 6, no. 2, pp. 838–849, 2024, doi: 10.47709/cnahpc.v6i2.3875.
M. Anita, B. Susanto, and L. Larwuy, “Perbandingan Metode Random Forest dan Naïve Bayes dalam Email Spam Filtering,” KUBIK J. Publ. Ilm. Mat., vol. 7, no. 2, pp. 88–96, 2023, doi: 10.15575/kubik.v7i2.18933.
E. G. Dada, J. S. Bassi, H. Chiroma, S. M. Abdulhamid, A. O. Adetunmbi, and O. E. Ajibuwa, “Machine learning for email spam filtering: review, approaches and open research problems,” Heliyon, vol. 5, no. 6, 2019, doi: 10.1016/j.heliyon.2019.e01802.
H. Mukhtar, J. Al Amien, and M. A. Rucyat, “Filtering Spam Email menggunakan Algoritma Naïve Bayes,” J. CoSciTech (Computer Sci. Inf. Technol., vol. 3, no. 1, pp. 9–19, 2022, doi: 10.37859/coscitech.v3i1.3652.
M. D. Akbar, M. Martanto, and ..., “Klasifikasi Motif Batik Jawa Menggunakan Algoritma K-Nearest Neighbors (Knn),” JURSIMA (Jurnal …, vol. 10, no. 2, 2022, [Online]. Available: https://ejournal.indobarunasional.ac.id/index.php/jursima/article/view/412%0Ahttps://ejournal.indobarunasional.ac.id/index.php/jursima/article/download/412/275
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Journal of Artificial Intelligence and Engineering Applications (JAIEA)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.