Implementation of a Data-Testid Attribute-Based Web Scraping Method for Accommodation Data Extraction from a Dynamic E-Commerce Website (Case Study: Traveloka)

Authors

  • Abdul Latif Universitas Bina Sarana Informatika
  • Siti Khotimatul Wildah Universitas Bina Sarana Informatika
  • Sarifah Agustiani Universitas Bina Sarana Informatika
  • Eka Herdit Juningsih Universitas Bina Sarana Informatika

DOI:

https://doi.org/10.59934/jaiea.v5i1.1256

Keywords:

Accommodation Data Extraction, data-testid Attribute, Dynamic Website, Selenium, Web Scraping

Abstract

With the growing dominance of Online Travel Agent (OTA) platforms in the digital tourism industry, the data presented therein has become a crucial asset for market analysis, competitive intelligence, and academic research. However, automatic data extraction (web scraping) from these modern platforms faces significant challenges due to the use of JavaScript frameworks that generate dynamic HTML structures with non-semantic and frequently changing CSS class names. This study proposes and validates a robust web scraping methodology to address these issues. Using Traveloka's website as a case study, this research leverages the data-testid attribute—a stable marker deliberately implemented by developers for automated testing purposes—as the primary selector for data extraction. The implementation was carried out using the Python programming language, with Selenium for browser automation and Beautiful Soup for HTML parsing. The results demonstrate that this method successfully and consistently extracts accommodation data, including hotel names, locations, rating scores, number of reviews, and pricing structures. The study concludes that utilizing the data-testid attribute offers a superior and more stable alternative to traditional scraping techniques that rely on CSS selectors or DOM structures, thereby providing an effective blueprint for data acquisition from contemporary dynamic web applications.

Downloads

Download data is not yet available.

References

J. M. Polgan, N. Khairunnisa, A. Hermawan, R. G. Guntara, U. P. Indonesia, and I. H. Bandung, “Strategi Pemasaran Untuk Meningkatkan Occupancy Kamar Hotel Melalui Online Travel Agent Di Indies Hotel Bandung,” vol. 13, no. 2023, pp. 2417–2423, 2025.

S. Suzuki, “Use of online travel agencies as a data source for tourism marketing,” J. Glob. Tour. Res., vol. 5, no. 2, pp. 167–171, 2020, doi: 10.37020/jgtr.5.2_167.

P. Salvi and S. Pawar, “Issues and challenges of Web Scraping : Health Care Industry Case Study Approach ISSUES AND CHALLENGES OF WEB SCRAPING : HEALTHCARE INDUSTRY,” vol. 11, no. 1, p. 7, 2023.

K. Sharma and G. M. Borkar, “Comparative Analysis of Dynamic Web Scraping Strategies: Evaluating Techniques for Enhanced Data Acquisition,” Adv. Commun. Syst., pp. 241–252, 2024, doi: 10.56155/978-81-955020-7-3-22.

S. Brisset, R. Rouvoy, L. Seinturier, R. Pawlak, and S. E. Jun, “Erratum : Leveraging Flexible Tree Matching to Repair Broken Locators in Web Automation Scripts,” 2021.

A. Z. Rizquina and C. I. Ratnasari, “Implementasi Web Scraping untuk Pengambilan Data Pada Website E-Commerce,” J. Teknol. Dan Sist. Inf. Bisnis, vol. 5, no. 4, pp. 377–383, 2023, doi: 10.47233/jteksis.v5i4.913.

F. F. Rahanto and I. Kharisudin, “Analisis Sentimen Data Ulasan Menggunakan Metode Naive Bayes Studi Kasus The Wujil Resort & Conventions Pada Situs Tripadvisor,” UNNES J. Math., vol. 10, no. 1, pp. 55–62, 2021.

Y. A. Hafiz and E. Sudarmilah, “Implementasi Web Scraping Pada Portal Berita Online,” Inisiasi, pp. 55–60, 2023, doi: 10.59344/inisiasi.v12i1.120.

N. Javier, B. D. Satoto, Y. Dwi, and P. Negara, “IMPLEMENTASI TEKNIK WEB SCRAPING UNTUK PENGUMPULAN DATA LAPORAN KEUANGAN PERUSAHAAN DI BURSA EFEK INDONESIA ( IDX ),” vol. 9, no. 2, pp. 2789–2795, 2025.

B. Pendidikan, D. Pelatihan Keuangan, K. Keuangan, M. Djufri, and P. Pajak, “JURNAL BPPK PENERAPAN TEKNIK WEB SCRAPING UNTUK PENGGALIAN POTENSI PAJAK (Studi Kasus pada Online Market Place Tokopedia, Shopee dan Bukalapak),” vol. 13, pp. 65–75, 2022.

F. Sembiring and D. P. Sari, “Penerapan teknik scraping python pada website marketplace indonesia,” Integr. (Journal Inf. Technol. Vocat. Educ., vol. 2, no. 1, pp. 15–22, 2020, doi: 10.17509/integrated.v2i1.28243.

E. Yuniar, D. S. Utsalinah, and D. Wahyuningsih, “Implementasi Scrapping Data Untuk Sentiment Analysis Pengguna Dompet Digital dengan Menggunakan Algoritma Machine Learning,” J. Janitra Inform. dan Sist. Inf., vol. 2, no. 1, pp. 35–42, 2022, doi: 10.25008/janitra.v2i1.145.

L. Hidayati, L. P. Kusuma, D. Agustini, and V. Y. P. Ardhana, “Implementasi Web Scraping Untuk Pengumpulan Data Media Sosial Lingkup Pemerintah Provinsi Ntb,” J. Sist. Inf. dan Inform., vol. 7, no. 1, pp. 63–72, 2024, doi: 10.47080/simika.v7i1.3200.

S. Satriajati, S. B. Panuntun, and S. Pramana, “Implementasi Web Scraping Dalam Pengumpulan Berita Kriminal Pada Masa Pandemi Covid-19,” Semin. Nas. Off. Stat., vol. 2020, no. 1, pp. 300–308, 2021, doi: 10.34123/semnasoffstat.v2020i1.578.

A. Ulfah and I. Najiah, “Implementasi Web Scraping Pada Situs Jurnal Sinta Menggunakan Framework Selenium Webdriver Python,” JIKA (Jurnal Inform., vol. 7, no. 1, p. 29, 2023, doi: 10.31000/jika.v7i1.7037.

A. S. Yondra, D. Triyanto, and S. Bahri, “Implementasi Web Scraping Untuk Mengumpulkan Informasi Produk Dari Situs E-Commerce,” J. Komput. Dan Apl., vol. 10, no. 01, pp. 93–102, 2022.

X. Wang, L. Xiao, T. Yu, A. Woepse, and S. Wong, “An automatic refactoring framework for replacing test-production inheritance by mocking mechanism,” ESEC/FSE 2021 - Proc. 29th ACM Jt. Meet. Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., pp. 540–552, 2021, doi: 10.1145/3468264.3468590.

A. Mulyana, I. ayu M. Er Meytha Gayatri, and W. Wagini, “Pemanfaatan Online Travel Agency (OTA) di Indonesia,” EKOMBIS Rev. J. Ilm. Ekon. dan Bisnis, vol. 11, no. 2, pp. 1179–1194, 2023, doi: 10.37676/ekombis.v11i2.3776.

P. S. Foundation, “Python 3.13.5 documentation.” Accessed: Jun. 19, 2025. [Online]. Available: https://www.python.org/doc/

BrowserStack, “Selenium Python Tutorial (with Example).” Accessed: Jun. 19, 2025. [Online]. Available: https://www.browserstack.com/guide/python-selenium-to-run-web-automation-test

T. S. Project, “Selenium 4.22.0.” Accessed: Jun. 19, 2025. [Online]. Available: https://pypi.org/project/selenium/

T. S. Project, “Selenium with Python.” Accessed: Jun. 19, 2025. [Online]. Available: https://selenium-python.readthedocs.io/

P. S. Foundation, “Beautiful Soup.” Accessed: Jun. 19, 2025. [Online]. Available: https://wiki.python.org/moin/BeautifulSoup

T. pandas development Team, “pandas: powerful Python data analysis toolkit.” Accessed: Jun. 19, 2025. [Online]. Available: https://pypi.org/project/pandas/

T. pandas development Team, “pandas documentation.” Accessed: Jun. 19, 2025. [Online]. Available: https://pandas.pydata.org/docs/

Downloads

Published

2025-10-15

How to Cite

Latif, A., Khotimatul Wildah, S., Agustiani, S., & Herdit Juningsih, E. . (2025). Implementation of a Data-Testid Attribute-Based Web Scraping Method for Accommodation Data Extraction from a Dynamic E-Commerce Website (Case Study: Traveloka). Journal of Artificial Intelligence and Engineering Applications (JAIEA), 5(1), 170–176. https://doi.org/10.59934/jaiea.v5i1.1256