Application of Machine Learning in Predicting FIFA World Cup Matches

Authors

  • Zulfikar Ismaya Ramadhani Universitas Bina Sarana Informatika
  • Syaifudin Universitas Bina Sarana Informatika
  • Beldi Sahfitda Universitas Bina Sarana Informatika
  • Seprianata Kusuma Universitas Bina Sarana Informatika
  • Ardiyansyah Universitas Bina Sarana Informatika

DOI:

https://doi.org/10.59934/jaiea.v5i2.1918

Keywords:

CRISP-DM, HistGradientBoosting, Machine Learning, Match Prediction, Streamlit

Abstract

Football is one of the world’s most widely followed sports, making it an appealing subject for predictive analytics using modern data technologies. This study aims to build a predictive model for international football match outcomes by applying the CRISP-DM methodology as the analytical framework. The dataset used is international_matches.csv covering the period 1993–2022, which underwent a series of preprocessing steps including data cleaning, feature engineering, encoding, imputation, and scaling. Several machine learning algorithms were evaluated, namely Logistic Regression, Random Forest, and HistGradientBoostingClassifier (HistGBM). The best model was obtained using the optimized HistGBM, which demonstrated superior capability in identifying home-team victories, achieving a Recall of 78%. This high sensitivity indicates that comparative features—such as rank difference and squad strength disparity across goalkeeper, defense, midfield, and attack attributes—play a crucial role in predicting dominant match outcomes. The trained model was subsequently deployed into an interactive Streamlit-based web application that enables users to input match-related information and obtain real-time predictions. Overall, this study shows that machine learning methods can be effectively utilized to support data-driven analysis of international football match outcomes.

Downloads

Download data is not yet available.

References

S. Kasnelly and I. Sari, “Respon Masyarakat Non Islam Terhadap Islam Pada Event Piala Dunia Qatar 2022,” Manajeman Bisnis Syariah, vol. 2, no. 2, pp. 25–35, 2022, [Online]. Available: www.ejournal.an-nadwah.ac.id

H. R. Isriwanto, “Analisis Kebutuhan Prediksi Pertandingan Bundesliga Menggunakan Metode Fuzzy,” Univ. Islam Indones., vol. 3, pp. 0–3, 2022.

A. F. Ari Yanto and G. Testiana, “Implementasi Metode Klasifikasi Naïve Bayes Untuk Memprediksi Juara La Liga,” J. Teknol. Sist. Inf., vol. 5, no. 2, pp. 128–139, 2024, doi: 10.35957/jtsi.v5i2.8028.

E. F. E. Atta Mills, Z. Deng, Z. Zhong, and J. Li, Data-driven prediction of soccer outcomes using enhanced machine and deep learning techniques, vol. 11, no. 1. Springer International Publishing, 2024. doi: 10.1186/s40537-024-01008-2.

M. Fitriani, G. F. Nama, and M. Mardiana, “Implementasi Association Rule Dengan Algoritma Apriori Pada Data Peminjaman Buku UPT Perpustakaan Universitas Lampung Menggunakan Metodologi CRISP-DM,” J. Inform. dan Tek. Elektro Terap., vol. 10, no. 1, pp. 41–49, 2022, doi: 10.23960/jitet.v10i1.2263.

E. L. Rara and E. Mailoa, “Implementasi Algoritma Naive Bayes Terhadap Analisis Sentimen Perubahan Piala Dunia U-20,” Progresif J. Ilm. Komput., vol. 20, no. 1, p. 259, 2024, doi: 10.35889/progresif.v20i1.1550.

F. Shalahudin and A. Sifaq, “Analisis Kemenangan Berdasarkan Kalah Presentase Ball Possession Pada Piala Dunia Sepak Bola 2022,” JPO J. Prestasi Olahraga, vol. 6, no. 1, pp. 20–24, 2023.

S. J. Pinasthika and D. R. Fudholi, “World Cup 2022 Knockout Stage Prediction Using Poisson Distribution Model,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 17, no. 2, pp. 151–160, 2023, doi: 10.22146/ijccs.82280.

A. A. Karim, M. A. Prasetyo, and M. R. Saputro, “Perbandingan Metode Random Forest, K-Nearest Neighbor, dan SVM Dalam Prediksi Akurasi Pertandingan Liga Italia,” Pros. Semin. Nas. Teknol. dan Sains , vol. 2, pp. 377–342, 2023, [Online]. Available: http://www.football-data.co.uk.

A. F. Pratama, D. Saputra, M. D. Fakhri, and A. P. Sari, “Prediksi Hasil Pertandingan Sepak Bola Menggunakan Metode FNN dan LSTM,” Pros. Semin. Nas. Inform. Bela Negara, vol. 3, pp. 138–144, 2023.

R. Rusdianto Hidayat et al., “Analitik prediktif sepakbola: model machine learning bri liga 1 indonesia Soccer predictive analytics: bri liga 1 indonesia machine learning models,” vol. 23, no. 4, pp. 386–399, 2024.

F. N. Dhewayani, D. Amelia, D. N. Alifah, B. N. Sari, and M. Jajuli, “Implementasi K-Means Clustering untuk Pengelompokkan Daerah Rawan Bencana Kebakaran Menggunakan Model CRISP-DM,” J. Teknol. dan Inf., vol. 12, no. 1, pp. 64–77, 2022, doi: 10.34010/jati.v12i1.6674.

S. A. Assaidi and F. Amin, “Analisis Sentimen Evaluasi Pembelajaran Tatap Muka 100 Persen pada Pengguna Twitter menggunakan Metode Logistic Regression,” J. Pendidik. Tambusai, vol. 6, no. 2, pp. 13217–133227, 2022.

I. Adriansyah, M. D. Mahendra, E. Rasywir, and Y. Pratama, “Perbandingan Metode Random Forest Classifier dan SVM Pada Klasifikasi Kemampuan Level Beradaptasi Pembelajaran Jarak Jauh Siswa,” Bull. Informatics Data Sci., vol. 1, no. 2, p. 98, 2022, doi: 10.61944/bids.v1i2.49.

İ. Mert, “Prediction of Wind Speed Using Tree-Based Ensemble Algorithms: CatBoost, HistGBM, and XGBoost,” Int. J. Multidiscip. Stud. Innov. Technol., vol. 9, no. 1, p. 145, 2025, doi: 10.36287/ijmsit.9.1.20.

Downloads

Published

2026-02-15

How to Cite

Zulfikar Ismaya Ramadhani, Syaifudin, Beldi Sahfitda, Seprianata Kusuma, & Ardiyansyah. (2026). Application of Machine Learning in Predicting FIFA World Cup Matches. Journal of Artificial Intelligence and Engineering Applications (JAIEA), 5(2), 2521–2530. https://doi.org/10.59934/jaiea.v5i2.1918