A Content-Based Filtering Approach Using TF-IDF and Cosine Similarity for Hotel Recommendation Based on Traveloka Accommodation Data: A Case Study of Jakarta

Authors

  • Abdul Latif Universitas Bina Sarana Informatika
  • Siti Khotimatul Wildah Universitas Bina Sarana Informatika
  • Sarifah Agustiani Universitas Bina Sarana Informatika
  • Ego Oktafanda Universitas Rokania

DOI:

https://doi.org/10.59934/jaiea.v5i3.2381

Keywords:

Content-Based Filtering, Cosine Similarity, Hotel Recommendation System, Popularity Score, TF-IDF

Abstract

The development of Online Travel Agents (OTA) has generated a large and diverse volume of accommodation data, which often makes it difficult for users to select hotels that match their preferences in terms of location, facilities, price, and service reputation. This study is a continuation of a previous work on Traveloka accommodation data acquisition using web scraping based on the data-testid attribute. The focus of this research is to utilize the scraped data for developing a machine learning-based hotel recommendation system using a content-based filtering approach. The dataset used consists of 1,809 hotel records in the Jakarta area with attributes including hotel name, property type, star rating, rating score, location, price, facilities, and image URL. Data preprocessing includes price cleaning, separating rating scores and number of reviews, and combining location, facilities, and property type as textual content representation. The recommendation model is built using Term Frequency–Inverse Document Frequency (TF-IDF) to construct text feature vectors, followed by Cosine Similarity to measure similarity between hotels. In addition, this study introduces a weighted popularity score that combines rating values and the number of reviews to ensure that recommendations are not only based on content similarity but also reflect the credibility of hotel popularity. Experimental results produce a TF-IDF matrix of size 1,364 × 371 and a similarity matrix of 1,364 × 1,364. Functional testing shows that the system is capable of generating ten relevant hotel recommendations based on similarity in location, facilities, and property type, which are then ranked according to the popularity score.

Downloads

Download data is not yet available.

References

N. Khairunnisa, A. Hermawan, and R. G. Guntara, “Strategi Pemasaran Untuk Meningkatkan Occupancy Kamar Hotel Melalui Online Travel Agent Di Indies Hotel Bandung,” vol. 13, no. 2023, pp. 2417–2423, 2025.

A. Latif, S. K. Wildah, S. Agustiani, and E. H. Juningsih, “Implementation of a Data-Testid Attribute-Based Web Scraping Method for Accommodation Data Extraction from a Dynamic E-Commerce Website ( Case Study : Traveloka ),” vol. 5, no. 1, 2025.

I. Hossain et al., “a survey o f recommender system techniques and the e − commerce domain - hossain 2023.pdf,” 2022.

Y. Ge et al., “A Survey on Trustworthy Recommender Systems,” vol. 1, no. 1, pp. 1–67, 2024.

K. Yi, R. Yamagishi, T. Li, Z. Bai, and Q. Ma, “Recommending POIs For Tourists By User Behavior Modeling and Pseudo-Rating,” 2021.

V. T. Camacho and J. Cruz, “Ontology-based Context Aware Recommender System Application for Tourism .,” pp. 1–41, 2022.

V. Vargas-Calderón, A. M. Ochoa, G. Y. C. Nieto, and J. E. Camargo, “Machine learning for assessing quality of service in the hospitality sector based on customer reviews,” no. 40, 2021.

L. Aravani, E. Pintelas, C. Pierrakeas, and P. Pintelas, “A Natural Language Processing Framework for Hotel Recommendation based on user’s text reviews.,” 2024.

P. Khadka and P. Lamichhane, “Content-based Recommendation Engine for Video Streaming Platform,” 2025.

A. Banerjee, T. Mahmudov, and E. Adler, “Modeling Sustainable City Trips : Integrating CO 2 e Emissions , Popularity , and Seasonality into Tourism Recommender Systems,” pp. 1–38, 2024.

Z. Wang and D. Jannach, “A Survey on Point-of-Interest Recommendations Leveraging Heterogeneous Data,” pp. 1–51, 2024.

S. Suzuki, “Use of online travel agencies as a data source for tourism marketing,” J. Glob. Tour. Res., vol. 5, no. 2, pp. 167–171, 2020, doi: 10.37020/jgtr.5.2_167.

D. Roy and M. Dutta, “A systematic review and research perspective on recommender systems,” J. Big Data, 2022, doi: 10.1186/s40537-022-00592-5.

N. N. K. Sari, Licantik, and M. Zahra, “Pemanfaatan Sistem Rekomendasi Menggunakan Content- Based Filtering pada Hotel di Palangka Raya,” vol. 15, no. 4, pp. 754–763, 2024.

C. A. Melyani, A. Kesumawati, R. Bagus, and F. Hakim, “Hotel Recommendation System with Content-Based Filtering Approach ( Case Study : Hotel in Yogyakarta on Nusatrip Website ) Department of Statistics , Universitas Islam Indonesia,” vol. 15, no. 1, pp. 152–157, 2022.

D. Pratiwi, Asrianda, and L. Rosnita, “Penerapan Metode Content-Based Filtering dalam Sistem Rekomendasi Objek Wisata di Aceh Tamiang,” vol. 4, no. 2, pp. 85–96, 2024.

Scikit-learn Developers, “TfidfVectorizer,” Scikit-learn Documentation. Accessed: May 14, 2026. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

Scikit-learn Developers, “cosine_similarity,” Scikit-learn Documentation. Accessed: May 14, 2026. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html

T. Ma, M. S. Bernstein, R. Johari, and N. Garg, “Balancing Producer Fairness and Efficiency via Prior-Weighted Rating System Design,” 2025.

A. Alhadlaq, S. Kerrache, and H. Aboalsamh, “A Recommendation Approach based on Similarity-Popularity Models of Complex Networks,” pp. 1–12, 2022.

Z. Xia, A. Sun, J. Xu, Y. Peng, and M. Cheng, “Contemporary Recommendation Systems on Big Data and Their Applications : A Survey,” vol. 12, no. July, 2024.

A. Banerjee, P. Banik, and W. Wörndl, “Towards Individual and Multistakeholder Fairness in Tourism Recommender Systems,” 2023.

F. Christyawan, A. N. Rohman, and A. D. Hartanto, “Application of Content-Based Filtering Method Using Cosine Similarity in Restaurant Selection Recommendation System,” vol. 6, no. 3, pp. 1559–1576, 2024, doi: 10.51519/journalisi.v6i3.806.

Downloads

Published

2026-06-15

How to Cite

Latif, A., Khotimatul Wildah, S., Agustiani, S., & Oktafanda, E. . (2026). A Content-Based Filtering Approach Using TF-IDF and Cosine Similarity for Hotel Recommendation Based on Traveloka Accommodation Data: A Case Study of Jakarta. Journal of Artificial Intelligence and Engineering Applications (JAIEA), 5(3), 4223–4229. https://doi.org/10.59934/jaiea.v5i3.2381

Issue

Section

Articles