Application of Machine Learning in Predicting FIFA World Cup Matches
DOI:
https://doi.org/10.59934/jaiea.v5i2.1918Keywords:
CRISP-DM, HistGradientBoosting, Machine Learning, Match Prediction, StreamlitAbstract
Football is one of the world’s most widely followed sports, making it an appealing subject for predictive analytics using modern data technologies. This study aims to build a predictive model for international football match outcomes by applying the CRISP-DM methodology as the analytical framework. The dataset used is international_matches.csv covering the period 1993–2022, which underwent a series of preprocessing steps including data cleaning, feature engineering, encoding, imputation, and scaling. Several machine learning algorithms were evaluated, namely Logistic Regression, Random Forest, and HistGradientBoostingClassifier (HistGBM). The best model was obtained using the optimized HistGBM, which demonstrated superior capability in identifying home-team victories, achieving a Recall of 78%. This high sensitivity indicates that comparative features—such as rank difference and squad strength disparity across goalkeeper, defense, midfield, and attack attributes—play a crucial role in predicting dominant match outcomes. The trained model was subsequently deployed into an interactive Streamlit-based web application that enables users to input match-related information and obtain real-time predictions. Overall, this study shows that machine learning methods can be effectively utilized to support data-driven analysis of international football match outcomes.
Downloads
References
S. Kasnelly and I. Sari, “Respon Masyarakat Non Islam Terhadap Islam Pada Event Piala Dunia Qatar 2022,” Manajeman Bisnis Syariah, vol. 2, no. 2, pp. 25–35, 2022, [Online]. Available: www.ejournal.an-nadwah.ac.id
H. R. Isriwanto, “Analisis Kebutuhan Prediksi Pertandingan Bundesliga Menggunakan Metode Fuzzy,” Univ. Islam Indones., vol. 3, pp. 0–3, 2022.
A. F. Ari Yanto and G. Testiana, “Implementasi Metode Klasifikasi Naïve Bayes Untuk Memprediksi Juara La Liga,” J. Teknol. Sist. Inf., vol. 5, no. 2, pp. 128–139, 2024, doi: 10.35957/jtsi.v5i2.8028.
E. F. E. Atta Mills, Z. Deng, Z. Zhong, and J. Li, Data-driven prediction of soccer outcomes using enhanced machine and deep learning techniques, vol. 11, no. 1. Springer International Publishing, 2024. doi: 10.1186/s40537-024-01008-2.
M. Fitriani, G. F. Nama, and M. Mardiana, “Implementasi Association Rule Dengan Algoritma Apriori Pada Data Peminjaman Buku UPT Perpustakaan Universitas Lampung Menggunakan Metodologi CRISP-DM,” J. Inform. dan Tek. Elektro Terap., vol. 10, no. 1, pp. 41–49, 2022, doi: 10.23960/jitet.v10i1.2263.
E. L. Rara and E. Mailoa, “Implementasi Algoritma Naive Bayes Terhadap Analisis Sentimen Perubahan Piala Dunia U-20,” Progresif J. Ilm. Komput., vol. 20, no. 1, p. 259, 2024, doi: 10.35889/progresif.v20i1.1550.
F. Shalahudin and A. Sifaq, “Analisis Kemenangan Berdasarkan Kalah Presentase Ball Possession Pada Piala Dunia Sepak Bola 2022,” JPO J. Prestasi Olahraga, vol. 6, no. 1, pp. 20–24, 2023.
S. J. Pinasthika and D. R. Fudholi, “World Cup 2022 Knockout Stage Prediction Using Poisson Distribution Model,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 17, no. 2, pp. 151–160, 2023, doi: 10.22146/ijccs.82280.
A. A. Karim, M. A. Prasetyo, and M. R. Saputro, “Perbandingan Metode Random Forest, K-Nearest Neighbor, dan SVM Dalam Prediksi Akurasi Pertandingan Liga Italia,” Pros. Semin. Nas. Teknol. dan Sains , vol. 2, pp. 377–342, 2023, [Online]. Available: http://www.football-data.co.uk.
A. F. Pratama, D. Saputra, M. D. Fakhri, and A. P. Sari, “Prediksi Hasil Pertandingan Sepak Bola Menggunakan Metode FNN dan LSTM,” Pros. Semin. Nas. Inform. Bela Negara, vol. 3, pp. 138–144, 2023.
R. Rusdianto Hidayat et al., “Analitik prediktif sepakbola: model machine learning bri liga 1 indonesia Soccer predictive analytics: bri liga 1 indonesia machine learning models,” vol. 23, no. 4, pp. 386–399, 2024.
F. N. Dhewayani, D. Amelia, D. N. Alifah, B. N. Sari, and M. Jajuli, “Implementasi K-Means Clustering untuk Pengelompokkan Daerah Rawan Bencana Kebakaran Menggunakan Model CRISP-DM,” J. Teknol. dan Inf., vol. 12, no. 1, pp. 64–77, 2022, doi: 10.34010/jati.v12i1.6674.
S. A. Assaidi and F. Amin, “Analisis Sentimen Evaluasi Pembelajaran Tatap Muka 100 Persen pada Pengguna Twitter menggunakan Metode Logistic Regression,” J. Pendidik. Tambusai, vol. 6, no. 2, pp. 13217–133227, 2022.
I. Adriansyah, M. D. Mahendra, E. Rasywir, and Y. Pratama, “Perbandingan Metode Random Forest Classifier dan SVM Pada Klasifikasi Kemampuan Level Beradaptasi Pembelajaran Jarak Jauh Siswa,” Bull. Informatics Data Sci., vol. 1, no. 2, p. 98, 2022, doi: 10.61944/bids.v1i2.49.
İ. Mert, “Prediction of Wind Speed Using Tree-Based Ensemble Algorithms: CatBoost, HistGBM, and XGBoost,” Int. J. Multidiscip. Stud. Innov. Technol., vol. 9, no. 1, p. 145, 2025, doi: 10.36287/ijmsit.9.1.20.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Journal of Artificial Intelligence and Engineering Applications (JAIEA)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.







