Enhancing Heart Disease Prediction through SMOTE-ENN Balancing and RFECV Feature Selection
DOI:
https://doi.org/10.59934/jaiea.v4i3.1057Keywords:
Random Forest, Balancing Data, Feature Selection, Heart Disease, Disease PredictionAbstract
Heart disease is the leading cause of mortality worldwide, exerting a significant influence on the national economic burden and productivity. The identification of heart disease is imperative for the prevention of more severe conditions, as it facilitates the detection of risks and symptoms at an early stage. The development of disease prediction models using machine learning has been extensively researched; however, the field continues to encounter challenges, including uneven data distribution and the presence of large, complex datasets. The proposed solution to these issues is the optimization of the Random Forest algorithm through the integration of the Synthetic Minority Over-sampling Technique and Edited Nearest Neighbor (SMOTE-ENN) with Recursive Feature Elimination and Cross-Validation (RFECV). The objective of these methods is to address the issue of data imbalance and to reduce irrelevant features, thereby enhancing the performance of the prediction model. The combination of SMOTE-ENN and RFECV consistently produces higher recall up to 0.984 and an optimal F1 score of 0.938. These results suggest that combining SMOTE-ENN data balancing and RFECV feature selection methods improves the performance of Random Forest, making it a promising approach for enhancing prediction models.
Downloads
References
A. Singh, H. Mahapatra, A. K. Biswal, M. Mahapatra, D. Singh, and M. Samantaray, “Heart Disease Detection Using Machine Learning Models,” Procedia Comput Sci, vol. 235, pp. 937–947, 2024, doi: 10.1016/j.procs.2024.04.089.
F. Febby, A. Arjuna, and M. Maryana, “Dukungan Keluarga Berhubungan dengan Kualitas Hidup Pasien Gagal Jantung,” Jurnal Penelitian Perawat Profesional, vol. 5, no. 2, pp. 691–702, Mar. 2023, doi: 10.37287/jppp.v5i2.1537.
WHO, “Cardiovascular diseases (CVDs),” World Health Organization. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
M. Wahidin, R. I. Agustiya, and G. Putro, “Beban Penyakit dan Program Pencegahan dan Pengendalian Penyakit Tidak Menular di Indonesia,” Jurnal Epidemiologi Kesehatan Indonesia, vol. 6, no. 2, Jan. 2023, doi: 10.7454/epidkes.v6i2.6253.
R. Sri Widyastuti, S. Pangarso Wisanggeni, and S. Rejeki, “Beban Ekonomi Penyakit Jantung Rp 67,34 Triliun,” Kompas. [Online]. Available: https://www.kompas.id/artikel/beban-ekonomi-penyakit-jantung-rp-674-triliun
Rokom, “Cegah Penyakit Jantung dengan Menerapkan Perilaku CERDIK dan PATUH,” Kementerian Kesehatan. [Online]. Available: https://sehatnegeriku.kemkes.go.id/baca/rilis-media/20230925/4943963/cegah-penyakit-jantung-dengan-menerapkan-perilaku-cerdik-dan-patuh/#:~:text=Penyakit%20jantung%20masih%20menjadi%20penyebab%20kematian%20nomor%20satu,mortalitasnya%20menyebabkan%20beban%
I. Johanis, I. A. Tedju Hinga, and A. B. Sir, “Faktor Risiko Hipertensi, Merokok dan Usia terhadap Kejadian Penyakit Jantung Koroner pada Pasien di RSUD Prof. Dr. W. Z. Johannes Kupang,” Media Kesehatan Masyarakat, vol. 2, no. 1, pp. 33–40, Jul. 2020, doi: 10.35508/mkm.v2i1.1954.
F. J. Montáns, F. Chinesta, R. Gómez-Bombarelli, and J. N. Kutz, “Data-driven modeling and learning in science and engineering,” Comptes Rendus. Mécanique, vol. 347, no. 11, pp. 845–855, Nov. 2019, doi: 10.1016/j.crme.2019.11.009.
S. Asif et al., “Advancements and Prospects of Machine Learning in Medical Diagnostics: Unveiling the Future of Diagnostic Precision,” Archives of Computational Methods in Engineering, vol. 32, no. 2, pp. 853–883, Mar. 2025, doi: 10.1007/s11831-024-10148-w.
Y. Kumar, A. Koul, R. Singla, and M. F. Ijaz, “Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda.,” J Ambient Intell Humaniz Comput, vol. 14, no. 7, pp. 8459–8486, 2023, doi: 10.1007/s12652-021-03612-z.
M. M. Ahsan, S. A. Luna, and Z. Siddique, “Machine-Learning-Based Disease Diagnosis: A Comprehensive Review.,” Healthcare (Basel), vol. 10, no. 3, Mar. 2022, doi: 10.3390/healthcare10030541.
E. Mardiani et al., “Membandingkan Algoritma Data Mining Dengan Tools Orange untuk Social Economy,” Digital Transformation Technology, vol. 3, pp. 686–693, May 2023, doi: 10.47709/digitech.v3i2.3256.
A. R. Afandi and H. Kurnia, “Revolusi Teknologi: Masa Depan Kecerdasan Buatan (AI) dan Dampaknya Terhadap Masyarakat,” Academy of Social Science and Global Citizenship Journal, vol. 3, no. 1, pp. 9–13, Jun. 2023, doi: 10.47200/aossagcj.v3i1.1837.
R. R. Chandan et al., “Reviewing the Impact of Machine Learning on Disease Diagnosis and Prognosis: A Comprehensive Analysis,” Open Pain J, vol. 17, no. 1, May 2024, doi: 10.2174/0118763863291395240516093102.
M. Sharma, J. D. Pandya, R. Thakkar, R. K. Sharma, A. Chopra, and R. S. Tyagi, “Comparative Analysis of Machine Learning Algorithms For Heart Disease Prediction: A Focus On Feature Importance and Model Performance,” in 2024 1st International Conference on Advances in Computing, Communication and Networking (ICAC2N), IEEE, Dec. 2024, pp. 1433–1437. doi: 10.1109/ICAC2N63387.2024.10894825.
N. V. D. S. S. V. Prasad Raju and P. N. Devi, “A Comparative Analysis of Machine Learning Algorithms for Big Data Applications in Predictive Analytics,” International Journal of Scientific Research and Management (IJSRM), vol. 12, no. 10, pp. 1608–1630, Oct. 2024, doi: 10.18535/ijsrm/v12i10.ec09.
L. Zhou, S. Pan, J. Wang, and A. V. Vasilakos, “Machine learning on big data: Opportunities and challenges,” Neurocomputing, vol. 237, pp. 350–361, May 2017, doi: 10.1016/j.neucom.2017.01.026.
W. Nugraha and M. Syarif, “Teknik Weighting untuk Mengatasi Ketidakseimbangan Kelas Pada Prediksi Churn Menggunakan XGBoost, LightGBM, dan CatBoost,” Techno.Com, vol. 22, no. 1, pp. 97–108, Feb. 2023, doi: 10.33633/tc.v22i1.7191.
A. Bailly et al., “Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models,” Comput Methods Programs Biomed, vol. 213, p. 106504, Jan. 2022, doi: 10.1016/j.cmpb.2021.106504.
V. Junita and F. A. Bachtiar, “Klasifikasi Aktivitas Manusia menggunakan Algoritme Decision Tree C4.5 dan Information Gain untuk Seleksi Fitur,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 3, no. 10, pp. 9426–9433, Jan. 2020, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/6446
E. Setia Budi, A. Nofriyaldi Chan, P. Priscillia Alda, and M. Arif Fauzi Idris, “RESOLUSI : Rekayasa Teknik Informatika dan Informasi Optimasi Model Machine Learning untuk Klasifikasi dan Prediksi Citra Menggunakan Algoritma Convolutional Neural Network,” Media Online, vol. 4, no. 5, p. 509, 2024, [Online]. Available: https://djournals.com/resolusi
J. Hamsalatha and S. Renukalatha, “Research on Heart Disease Detection using Machine Learning and Deep Learning Techniques,” in 2024 First International Conference on Software, Systems and Information Technology (SSITCON), IEEE, Oct. 2024, pp. 1–4. doi: 10.1109/SSITCON62437.2024.10797057.
Y. Huang et al., “Using a machine learning-based risk prediction model to analyze the coronary artery calcification score and predict coronary heart disease and risk assessment,” Comput Biol Med, vol. 151, p. 106297, Dec. 2022, doi: 10.1016/j.compbiomed.2022.106297.
H. Hairani and D. Priyanto, “A New Approach of Hybrid Sampling SMOTE and ENN to the Accuracy of Machine Learning Methods on Unbalanced Diabetes Disease Data,” International Journal of Advanced Computer Science and Applications, vol. 14, no. 8, 2023, doi: 10.14569/IJACSA.2023.0140864.
P. S. Yadav, R. S. Rao, A. Mishra, and M. Gupta, “Ensemble methods with feature selection and data balancing for improved code smells classification performance,” Eng Appl Artif Intell, vol. 139, p. 109527, Jan. 2025, doi: 10.1016/j.engappai.2024.109527.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Journal of Artificial Intelligence and Engineering Applications (JAIEA)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.