Performance Analysis of SMOTE and SMOTEN Techniques for Daily Rainfall Classification using XGBoost

Authors

  • Najwa Laila Anggraini Universitas Pembangunan Nasional Veteran Jawa Timur
  • Basuki Rahmat Universitas Pembangunan Nasional Veteran Jawa Timur
  • Achmad Junaidi Universitas Pembangunan Nasional Veteran Jawa Timur

DOI:

https://doi.org/10.59934/jaiea.v4i3.1066

Keywords:

Daily Rainfall, XGBoost, SMOTE, SMOTEN, Classification

Abstract

The vital function that rainfall patterns fulfill in diverse sectors of life, including agriculture, water management, and disaster mitigation, has engendered the necessity for an accurate rainfall classification system to facilitate early warning and decision-making. However, the development of a classification system is often encumbered by various obstacles, with data imbalance being a prominent one. The objective of this study is to analyze two data resampling techniques, namely SMOTE and SMOTEN, with the aim of improving the performance of the XGBoost classification model. The dataset utilized is accessible on the BMKG website and is classified into five categories. Subsequent to the preprocessing stage, the data is divided by two schemes: 70:30 and 80:20. The determination of the sensitivity of each dataset is achieved through variations in the number of folds in cross validation and the use of learning rates. The experimental results indicate that the SMOTE configuration, with a data division proportion of 80:20 using 10 folds and a learning rate of 0.15, attains the maximum accuracy value of 92.92%. This represents a substantial enhancement from the original dataset accuracy result of 75.36% and surpasses the SMOTE experimental results with an accuracy of 90.58%. Consequently, SMOTEN was found to be superior and effective in managing the imbalance of numerical and categorical datasets, thereby enhancing the performance of the XGBoost model in daily rainfall classification.

Downloads

Download data is not yet available.

References

BMKG, “Analisis Laju Perubahan Curah Hujan Tahunan.” Accessed: Dec. 17, 2024. [Online]. Available: https://www.bmkg.go.id/iklim/analisis-laju-perubahan-curah-hujan

“Data Bencana di Tingkat Kabupaten/Kota dapat dilihat file pdf Buku Data Bencana Indonesia 2023 pada qris berikut ini.”

J. P. Haumahu, S. D. H. Permana, and Y. Yaddarabullah, “Fake news classification for Indonesian news using Extreme Gradient Boosting (XGBoost),” IOP Conf Ser Mater Sci Eng, vol. 1098, no. 5, p. 052081, Mar. 2021, doi: 10.1088/1757-899x/1098/5/052081.

D. D. Wiwaha, D. A. Gafyunedi, Z. M. Mahdi, I. W. Putro, B. A. Pramudita, and D. P. Setiawan, “Enhancing Rainfall Prediction Accuracy through XGBoost Model with Data Balancing Techniques,” in 2024 20th IEEE International Colloquium on Signal Processing and Its Applications, CSPA 2024 - Conference Proceedings, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 120–125. doi: 10.1109/CSPA60979.2024.10525558.

J. Yang and J. Guan, “A Heart Disease Prediction Model Based on Feature Optimization and Smote-Xgboost Algorithm,” Information (Switzerland), vol. 13, no. 10, Oct. 2022, doi: 10.3390/info13100475.

A. M. Sapari, A. Id Hadiana, and F. R. Umbara, “Air Quality Classification Using Extreme Gradient Boosting (XGBOOST) Algorithm ARTICLE INFORMATION ABSTRACT,” 2023. [Online]. Available: http://innovatics.unsil.ac.id

M. Alamri and M. Ykhlef, “Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data,” IEEE Access, vol. 12, pp. 14050–14060, 2024, doi: 10.1109/ACCESS.2024.3357091.

D. Dablain, B. Krawczyk, and N. V. Chawla, “DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data,” IEEE Trans Neural Netw Learn Syst, vol. 34, no. 9, pp. 6390–6404, Sep. 2023, doi: 10.1109/TNNLS.2021.3136503.

K. Roberts-Licklider and T. Trafalis, “Machine Learning Techniques with Fairness for Prediction of Completion of Drug and Alcohol Rehabilitation,” Apr. 08, 2024. doi: 10.21203/rs.3.rs-4208301/v1.

C. García-Vicente et al., “Evaluation of Synthetic Categorical Data Generation Techniques for Predicting Cardiovascular Diseases and Post-Hoc Interpretability of the Risk Factors,” Applied Sciences (Switzerland), vol. 13, no. 7, Apr. 2023, doi: 10.3390/app13074119.

A. Alabrah, “An Improved CCF Detector to Handle the Problem of Class Imbalance with Outlier Normalization Using IQR Method,” Sensors, vol. 23, no. 9, May 2023, doi: 10.3390/s23094406.

Downloads

Published

2025-06-15

How to Cite

Najwa Laila Anggraini, Basuki Rahmat, & Achmad Junaidi. (2025). Performance Analysis of SMOTE and SMOTEN Techniques for Daily Rainfall Classification using XGBoost. Journal of Artificial Intelligence and Engineering Applications (JAIEA), 4(3), 1987–1993. https://doi.org/10.59934/jaiea.v4i3.1066

Issue

Section

Articles