Application of K-Means Clustering for Urban Transportation Pattern Analysis Using Big Data Trip Dataset

Authors

  • Tegas Ramadhan Universitas Negeri Medan
  • Hafizh Ariiq Universitas Negeri Medan
  • Muhammad Dzaki Arjun Universitas Negeri Medan
  • Muhammad Ridho Ananda Aditya Universitas Negeri Medan

DOI:

https://doi.org/10.59934/jaiea.v5i3.2237

Keywords:

Big Data; Clustering; K-Means; Data Mining; Transportation Analysis

Abstract

The rapid growth of urban transportation systems has led to the generation of massive amounts of data, commonly referred to as big data. This study aims to analyze transportation patterns using large-scale data obtained from the NYC Taxi Trip Records. The dataset exhibits key big data characteristics, including volume, velocity, and variety. This research applies the K-Means clustering algorithm to group taxi trip data based on features such as trip distance, fare amount, and trip duration. Several preprocessing techniques are performed, including data cleaning, feature engineering, sampling, and normalization. The optimal number of clusters is determined using the Elbow Method and Silhouette Score. The results show that the dataset can be effectively grouped into three clusters representing distinct transportation patterns. These findings demonstrate the capability of clustering techniques in extracting meaningful insights from large-scale datasets and highlight their potential application in urban transportation planning.

Downloads

Download data is not yet available.

References

M. R. Alfahri, M. Z. Alkautsar, N. Khoiriah, A. P. H. Simbolon, and F. Ramadhani, “Analisis Pola dan Optimalisasi Rute Perjalanan Taksi Menggunakan K-Means Clustering,” Jurnal Nasional Komputasi dan Teknologi Informasi (JNKTI), vol. 8, no. 2, 2025.

X. Zhang and X. Zhao, “A Clustering-aided Ensemble Method for Predicting Ridesourcing Demand in Chicago,” arXiv preprint arXiv:2109.03433, 2021.

J. Lang, Z. Yang, Y. Zhou, C. Wen, and X. Cheng, “Four-dimensional aircraft emission inventory dataset of the landing-and-takeoff cycle in China (2019-2023),” Earth System Science Data, vol. 17, pp. 2489–2506, 2025, doi: 10.5194/essd-17-2489-2025.

M. B. Hasan and M. Sarker, “Unraveling Urban Traffic Congestion Patterns in Bangladesh,” in Proceedings of the 11th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2025), 2025, pp. 319–325, doi: 10.5220/0013193600003941.

E. A. Prasetio, D. Novizayanti, and A. N. A. Putri, “Cluster analysis of potential autonomous vehicle (AV) adopters in Indonesia's new capital,” Transportation Research Interdisciplinary Perspectives, vol. 29, 2025, doi: 10.1016/j.trip.2024.101318.

A. L. D. Loureiro, V. L. Miguéis, Á. Costa, and M. Ferreira, “Improving customer retention in taxi industry using travel data analytics: A churn prediction study,” Journal of Retailing and Consumer Services, vol. 85, 2025, doi: 10.1016/j.jretconser.2024.104288.

X. Li, J. Mango, J. Song, and D. Zhang, “XStar: a software system for handling taxi trajectory big data,” Computational Urban Science, vol. 1, no. 17, 2021, doi: 10.1007/s43762-021-00015-w.

D. Tzika-Kostopoulou, E. Nathanail, and K. Kokkinos, “Big data in transportation: a systematic literature analysis and topic classification,” Knowledge and Information Systems, vol. 66, pp. 5021–5046, 2024, doi: 10.1007/s10115-024-02112-8.

E. Sebti and Y. Chen, “Mining Hidden Ridesharing Patterns: A Data-Driven Gap Analysis of Chicago TNC Trips,” Data Science for Transportation, vol. 8, no. 7, 2026, doi: 10.1007/s42421-026-00149-5.

W. Jiang, “Data-driven Analysis of Taxi and Ride-hailing Services: Case Study in Chengdu, China,” Computer and Decision Making - An International Journal, vol. 2, pp. 357–373, 2025.

S. Alam, M. S. Ayub, H. Cui, and M. A. Khan, “A comparative study of machine learning models for taxi-demand prediction using a big data framework,” Public Transport, vol. 17, pp. 803–833, 2025, doi: 10.1007/s12469-025-00401-1.

Downloads

Published

2026-06-02

How to Cite

Ramadhan, T., Hafizh Ariiq, Muhammad Dzaki Arjun, & Muhammad Ridho Ananda Aditya. (2026). Application of K-Means Clustering for Urban Transportation Pattern Analysis Using Big Data Trip Dataset. Journal of Artificial Intelligence and Engineering Applications (JAIEA), 5(3), 3721–3729. https://doi.org/10.59934/jaiea.v5i3.2237

Issue

Section

Articles