Application of K-Means Clustering for Urban Transportation Pattern Analysis Using Big Data Trip Dataset
DOI:
https://doi.org/10.59934/jaiea.v5i3.2237Keywords:
Big Data; Clustering; K-Means; Data Mining; Transportation AnalysisAbstract
The rapid growth of urban transportation systems has led to the generation of massive amounts of data, commonly referred to as big data. This study aims to analyze transportation patterns using large-scale data obtained from the NYC Taxi Trip Records. The dataset exhibits key big data characteristics, including volume, velocity, and variety. This research applies the K-Means clustering algorithm to group taxi trip data based on features such as trip distance, fare amount, and trip duration. Several preprocessing techniques are performed, including data cleaning, feature engineering, sampling, and normalization. The optimal number of clusters is determined using the Elbow Method and Silhouette Score. The results show that the dataset can be effectively grouped into three clusters representing distinct transportation patterns. These findings demonstrate the capability of clustering techniques in extracting meaningful insights from large-scale datasets and highlight their potential application in urban transportation planning.
Downloads
References
M. R. Alfahri, M. Z. Alkautsar, N. Khoiriah, A. P. H. Simbolon, and F. Ramadhani, “Analisis Pola dan Optimalisasi Rute Perjalanan Taksi Menggunakan K-Means Clustering,” Jurnal Nasional Komputasi dan Teknologi Informasi (JNKTI), vol. 8, no. 2, 2025.
X. Zhang and X. Zhao, “A Clustering-aided Ensemble Method for Predicting Ridesourcing Demand in Chicago,” arXiv preprint arXiv:2109.03433, 2021.
J. Lang, Z. Yang, Y. Zhou, C. Wen, and X. Cheng, “Four-dimensional aircraft emission inventory dataset of the landing-and-takeoff cycle in China (2019-2023),” Earth System Science Data, vol. 17, pp. 2489–2506, 2025, doi: 10.5194/essd-17-2489-2025.
M. B. Hasan and M. Sarker, “Unraveling Urban Traffic Congestion Patterns in Bangladesh,” in Proceedings of the 11th International Conference on Vehicle Technology and Intelligent Transport Systems (VEHITS 2025), 2025, pp. 319–325, doi: 10.5220/0013193600003941.
E. A. Prasetio, D. Novizayanti, and A. N. A. Putri, “Cluster analysis of potential autonomous vehicle (AV) adopters in Indonesia's new capital,” Transportation Research Interdisciplinary Perspectives, vol. 29, 2025, doi: 10.1016/j.trip.2024.101318.
A. L. D. Loureiro, V. L. Miguéis, Á. Costa, and M. Ferreira, “Improving customer retention in taxi industry using travel data analytics: A churn prediction study,” Journal of Retailing and Consumer Services, vol. 85, 2025, doi: 10.1016/j.jretconser.2024.104288.
X. Li, J. Mango, J. Song, and D. Zhang, “XStar: a software system for handling taxi trajectory big data,” Computational Urban Science, vol. 1, no. 17, 2021, doi: 10.1007/s43762-021-00015-w.
D. Tzika-Kostopoulou, E. Nathanail, and K. Kokkinos, “Big data in transportation: a systematic literature analysis and topic classification,” Knowledge and Information Systems, vol. 66, pp. 5021–5046, 2024, doi: 10.1007/s10115-024-02112-8.
E. Sebti and Y. Chen, “Mining Hidden Ridesharing Patterns: A Data-Driven Gap Analysis of Chicago TNC Trips,” Data Science for Transportation, vol. 8, no. 7, 2026, doi: 10.1007/s42421-026-00149-5.
W. Jiang, “Data-driven Analysis of Taxi and Ride-hailing Services: Case Study in Chengdu, China,” Computer and Decision Making - An International Journal, vol. 2, pp. 357–373, 2025.
S. Alam, M. S. Ayub, H. Cui, and M. A. Khan, “A comparative study of machine learning models for taxi-demand prediction using a big data framework,” Public Transport, vol. 17, pp. 803–833, 2025, doi: 10.1007/s12469-025-00401-1.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Journal of Artificial Intelligence and Engineering Applications (JAIEA)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.








