Synthetic Minority Oversampling Technique (SMOTE) for Boosting the Accuracy of C4.5 Algorithm Model
DOI:
https://doi.org/10.59934/jaiea.v3i3.469Keywords:
Accuracy; Boosting; C4.5 Algorithm; Confusion Matrix; SMOTEAbstract
The low accuracy of the classification model may be caused by dataset imbalance. In reality, low-accuracy models are unacceptable. The purpose of this research is to address data imbalances in an employee performance dataset identified using the C4.5 method. SMOTE is the approach for addressing data imbalance. SMOTE is utilized to generate a large amount of data in the majority or minority class, which has an initial classification accuracy of just 17%. The C4.5 algorithm classifies the new dataset created by SMOTE, which consists of 11 attributes divided three times between training and testing data. The research found that with a 60:40 data split, the classification model had a 69% accuracy. Model accuracy climbed to 76% at 70:30 data splitting, and 86% at the final splitting, which was 80:20. The model's output matches the evaluation findings obtained using the confusion matrix. The research findings indicate that SMOTE may improve classification model accuracy by boosting data in imbalanced classes.
Downloads
References
J. H. Joloudari, A. Marefat, M. A. Nematollahi, S. S. Oyelere, and S. Hussain, “Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks,” Appl. Sci., vol. 13, no. 4006, p. 34, 2023, doi: 10.3390/app13064006.
A. S. Hussein, T. Li, C. W. Yohannese, and K. Bashir, “A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE,” Int. J. Comput. Intell. Syst., vol. 12, no. 2, pp. 1412–1422, 2019, doi: 10.2991/ijcis.d.191114.002.
T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Inf., vol. 14, no. 54, p. 15, 2023, doi: 10.3390/info14010054.
A. N. Kasanah, M. Muladi, and U. Pujianto, “Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 3, no. 2, pp. 196–201, 2019, doi: 10.29207/resti.v3i2.945.
R. A. Nurdian, Mujib Ridwan, and Ahmad Yusuf, “Komparasi Metode SMOTE dan ADASYN dalam Meningkatkan Performa Klasifikasi Herregistrasi Mahasiswa Baru,” J. Tek. Inform. dan Sist. Inf., vol. 8, no. 1, pp. 24–32, 2022, doi: 10.28932/jutisi.v8i1.4004.
A. A. Arifiyanti and E. D. Wahyuni, “Smote: Metode Penyeimbang Kelas Pada Klasifikasi Data Mining,” SCAN - J. Teknol. Inf. dan Komun., vol. 15, no. 1, pp. 34–39, 2020, doi: 10.33005/scan.v15i1.1850.
M. Waqar, H. Dawood, H. Dawood, N. Majeed, A. Banjar, and R. Alharbey, “An Efficient SMOTE-Based Deep Learning Model for Heart Attack Prediction,” Sci. Program., vol. 2021, no., p. 12, 2021, doi: 10.1155/2021/6621622.
F. P. Arifianti and A. Salam, “Automated Maintenance System For Freshwater Aquascape Based On The Internet Of Things (Iot),” Adv. Sustain. Sci. Eng. Technol., vol. 6, no. 1, pp. 02401025-01 ~ 02401025-08, 2024, doi: 10.26877/asset.v6i1.17951.
D. Jollyta, P. Prihandoko, A. Hajjah, E. Haerani, and M. Siddik, Algoritma Klasifikasi Untuk Pemula Solusi Pyhton dan RapidMiner, Pertama. Yogyakarta: Deepublish, 2023.
Y. Fakir, M. Azalmad, and R. Elaychi, “Study of The ID3 and C4.5 Learning Algorithms,” J. Med. INFORMATICS Decis. Mak., vol. 1, no. 2, pp. 29–43, 2020, doi: 10.14302/issn.2641.
A. Muneer, R. F. Ali, A. Alghamdi, S. M. Taib, A. Almaghthawi, and E. A. Abdullah Ghaleb, “Predicting customers churning in banking industry: A machine learning approach,” Indones. J. Electr. Eng. Comput. Sci., vol. 26, no. 1, pp. 539–549, 2022, doi: 10.11591/ijeecs.v26.i1.pp539-549.
M. H. Kotb and R. Ming, “Comparing SMOTE Family Techniques in Predicting Insurance Premium Defaulting using Machine Learning Models,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 9, pp. 621–629, 2021, doi: 10.14569/IJACSA.2021.0120970.
D. B. Prakash, C. Narendar, D. Sumalatha, and A. Professor, “Handling Class Imbalance Problem in Machine Learning Using Synthetic Minority Oversampling Technique (Smote),” Int. Res. J. Mod. Eng. Technol. Sci., vol. 03, no. 03, pp. 1863–1868, 2021, [Online]. Available: www.irjmets.com.
H. Yun, “Prediction model of algal blooms using logistic regression and confusion matrix,” Int. J. Electr. Comput. Eng., vol. 11, no. 3, pp. 2407–2413, 2021, doi: 10.11591/ijece.v11i3.pp2407-2413.
K. A. Abbas et al., “Unsupervised machine learning technique for classifying production zones in unconventional reservoirs,” Int. J. Intell. Networks, vol. 4, no. October 2022, pp. 29–37, 2023, doi: 10.1016/j.ijin.2022.11.007.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Journal of Artificial Intelligence and Engineering Applications (JAIEA)
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.