Implementation of TF-IDF and XGBoost Algorithms in Scientific Paper Classification
DOI:
https://doi.org/10.59934/jaiea.v5i1.1647Keywords:
Text Classification, TF-IDF, XGBoost, Scientific Paper, Hyperparameter TuningAbstract
The rapid growth of scientific publications in the field of informatics demands an accurate and efficient automated classification system. This study aims to implement TF-IDF as a feature extraction method and XGBoost as a classification model to categorize scientific papers, particularly based on their titles. The dataset consists of 1,000 scientific paper titles in the field of informatics, which were collected and processed using text preprocessing techniques. The XGBoost model was trained using TF-IDF vector representations, and hyperparameter tuning was applied to enhance model performance, focusing on parameters such as learning_rate, max_depth, and n_estimators. Evaluation results show that the developed system achieved an accuracy of 81%, with solid performance in distinguishing between “Computer Science” and “Non-Computer Science” categories. This study demonstrates that the combination of TF-IDF and XGBoost is effective for short-text classification such as scientific titles and has potential for further development in multi-class classification and more complex datasets.
Downloads
References
A. C. Nisha, G. I. Marthasari, and G. W. Wicaksono, “Klasifikasi Abstrak Jurnal Repositor di Teknik Informatika UMM Menggunakan Metode Neighbor Weighted K-Nearest Neighbor,” Jurnal Repositor, vol. 3, no. 3, 2021.
R. Nuraeni, A. Sudiarjo, and R. Rizal, “Perbandingan Algoritma Naà ve Bayes Classifier dan Algoritma Decision Tree untuk Analisa Sistem Klasifikasi Judul Skripsi,” Innovation in Research of Informatics (INNOVATICS), vol. 3, no. 1, 2021.
M. I. Maulana, K. M. Lhaksmana, and M. Dwifebri, “Klasifikasi Komentar Toxic Pada Sosial Media Menggunakan SVM, Information Gain dan TF-IDF,” eProceedings of Engineering, vol. 10, no. 5, 2023.
R. Hayami and S. Mohnica, “Klasifikasi multilabel komentar toxic pada sosial media twitter menggunakan convolutional neural network (CNN),” Jurnal CoSciTech (Computer Science and Information Technology), vol. 4, no. 1, pp. 1–6, 2023.
D. Safitri and T. A. Fitri, “Perbandingan Algoritma XGBoost dan SVM Dalam Analisis Opini Publik Pemilihan Presiden 2024,” Indonesian Journal of Computer Science, vol. 13, no. 3, 2024.
Nurdin, N., Suhendri, M., Afrilia, Y., & Rizal, R. (2021). Klasifikasi Karya Ilmiah (Tugas Akhir) Mahasiswa Menggunakan Metode Naive Bayes Classifier (NBC). SISTEMASI: Jurnal Sistem Informasi, 10(2), 268–279.
Nurhadi, A. (2012). Implementasi algoritma naive bayes classifier berbasis particle swarm optimization (PSO) untuk klasifikasi konten berita digital bahasa Indonesia. Sari, 2(3), 48–56.
Krisnandi, D., Ambarwati, R. N., Asih, A. Y., Ardiansyah, A., & Pardede, H. F. (2023). Analisis Komentar Cyberbullying Terhadap Kata Yang Mengandung Toksisitas Dan Agresi Menggunakan Bag of Words dan TF-IDF Dengan Klasifikasi SVM. Jurnal Linguistik Komputasional, 6(2), 36–41.
Sari, A. K., Irsyad, A., Aini, D. N., & Ginting, S. E. (2024). Analisis Sentimen Twitter Menggunakan Machine Learning untuk Identifikasi Konten Negatif. Adopsi Teknologi Dan Sistem Informasi (ATASI), 3(1), 64–73.
Prameswari, M., Kania, P. E., De Ayu, I. G., & Harnoko, S. N. P. (2024). Penerapan Metode Stacking Ensemble Untuk Klasifikasi Status Pinjaman Nasabah Bank. PROSIDING SEMINAR NASIONAL SAINS DATA, 4(1), 802–811.
Warda, F., Fajri, F. N., & Tholib, A. (2023). Classification of Final Project Titles Using Bidirectional Long Short Term Memory at the Faculty of Engineering Nurul Jadid University. Jurnal Sisfokom (Sistem Informasi Dan Komputer), 12(3), 356–362.
B. Y. Geni, D. Ramayanti, and A. Ratnasari, “IMPLEMENTASI SISTEM POIN OF SALE TERINTEGRASI BERBASIS PYTHON,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 8, no. 4, pp. 4387–4393, 2024.
Gumanti, & Elanda, A. (2022). Penerapan Algoritma K-Nearest Neighbor Untuk Klasifikasi Topik Skripsi Mahasiswa di Fakultas Ilmu Komputer. Doctoral Dissertation, Universitas Lancang Kuning.
Wijayaningrum, V. N., & Lestari, V. A. (2022, September). Jupyter lab platform-based interactive learning. In 2022 International Conference on Electrical and Information Technology (IEIT) (pp. 295-301). IEEE.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Journal of Artificial Intelligence and Engineering Applications (JAIEA)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.







