Improving Sentiment Analysis Performance of Tokopedia Reviews Using Principal Component Analysis and Naïve Bayes Algorithm

Authors

  • Anjar Ayuning Lestari STMIK IKMI Cirebon
  • Ahmad Faqih STMIK IKMI Cirebon
  • Gifthera Dwilestari STMIK IKMI Cirebon

DOI:

https://doi.org/10.59934/jaiea.v4i2.743

Keywords:

Principal Component Analysis(PCA), Naïve Bayes, sentiment analysis, Tokopedia, TF-IDF, accuracy, e-commerce

Abstract

Tokopedia one of Indonesia's largest e-commerce platforms, offers a wide range of products with diverse customer reviews. These reviews reflect consumer opinions and provide valuable insights for service improvement and marketing strategies. Sentiment analysis is crucial for understanding customer perceptions, but processing large-scale, high-dimensional text data remains a challenge, impacting model efficiency and accuracy. This research uses Principal Component Analysis (PCA) to reduce data dimensionality without losing important information for sentiment classification. The study begins by collecting Tokopedia product reviews and preprocessing the text, including data cleaning, tokenization, stopword removal, and stemming. The reviews are then converted into numerical vectors using the Term Frequency-Inverse Document Frequency (TF-IDF) method. A Gaussian Naïve Bayes model is employed to classify sentiment into three categories: positive, neutral, and negative. The results demonstrate that PCA significantly improves model accuracy from 63.13% to 70.47%, with gains in precision (71.85%), recall (70.47%), and F1-score (71.06%). This research contributes to enhancing sentiment analysis techniques using PCA for Tokopedia reviews and offers a valuable approach that can be applied to other e-commerce platforms.

 

 

Downloads

Download data is not yet available.

References

M. Ala’raj, M. Majdalawieh, and M. F. Abbod, “Improving binary classification using filtering based on k-NN proximity graphs,” J. Big Data, vol. 7, no. 1, p. 15, Dec. 2020, doi: 10.1186/s40537-020-00297-7.

A. R. Isnain, N. S. Marga, and D. Alita, “Sentiment Analysis Of Government Policy On Corona Case Using Naive Bayes Algorithm,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 15, no. 1, p. 55, 2021, doi: 10.22146/ijccs.60718.

A. Y. Simanjuntak, I. S. S. Simatupang, and Anita, “Implementasi Data Mining Menggunakan Metode Naïve Bayes Classifier Untuk Data Kenaikan Pangkat Dinas,” J. Sci. Soc. Res., vol. 4307, no. 1, pp. 85–91, 2022.

H. P. Doloksaribu and Y. T. Samuel, “Komparasi Algoritma Data Mining Untuk Analisis Sentimen Aplikasi Pedulilindungi,” J. Teknol. Inf. J. Keilmuan dan Apl. Bid. Tek. Inform., vol. 16, no. 1, pp. 1–11, 2022, doi: 10.47111/jti.v16i1.3747.

M. I. Fikri, T. S. Sabrila, and Y. Azhar, “Comparison of Naïve Bayes and Support Vector Machine Methods in Twitter Sentiment Analysis,” Smatika J., vol. 10, no. 02, pp. 71–76, 2020.

M. Y. Putra and D. I. Putri, “Pemanfaatan Algoritma Naïve Bayes dan K-Nearest Neighbor Untuk Klasifikasi Jurusan Siswa Kelas XI,” J. Tekno Kompak, vol. 16, no. 2, p. 176, 2022, doi: 10.33365/jtk.v16i2.2002.

J. Supriyanto, D. Alita, and A. R. Isnain, “Penerapan Algoritma K-Nearest Neighbor (K-NN) Untuk Analisis Sentimen Publik Terhadap Pembelajaran Daring,” J. Inform. dan Rekayasa Perangkat Lunak, vol. 4, no. 1, pp. 74–80, 2023, doi: 10.33365/jatika.v4i1.2468.

A. Averina, H. Hadi, and J. Siswantoro, “Analisis Sentimen Multi-Kelas Untuk Film Berbasis Teks Ulasan Menggunakan Model Regresi Logistik,” Teknika, vol. 11, no. 2, pp. 123–128, 2022, doi: 10.34148/teknika.v11i2.461.

K. Astoni and M. Haris, “Analisis Penerapan Principal Component Analysis (Pca) Pada Deteksi Kecurangan Kartu Kredit Menggunakan Random Forest an Analysis of Principal Component Analysis Implementation on Credit Card Fraud Detection Using Random Forest,” J. Elektro Telekomun. Terap., vol. 9, no. 1, pp. 1152–1161, 2022, [Online]. Available: https://doi.org/10.25124/jett.v9i1.5019

M. H. Wicaksono, M. D. Purbolaksono, and S. Al Faraby, “Perbandingan Algoritma Machine Learning untuk Analisis Sentimen Berbasis Aspek pada Review Female Daily,” eProceedings Eng., vol. 10, no. 3, pp. 3591–3600, 2023.

E. Hokijuliandy, H. Napitupulu, and F. Firdaniza, “Analisis Sentimen Menggunakan Metode Klasifikasi Support Vector Machine (SVM) dan Seleksi Fitur Chi-Square,” SisInfo J. Sist. Inf. dan Inform., vol. 5, no. 2, pp. 40–49, 2023, doi: 10.37278/sisinfo.v5i2.670.

A. Dinanti and J. Purwadi, “Analisis Performa Algoritma K-Nearest Neighbor dan Reduksi Dimensi Menggunakan Principal Component Analysis,” Jambura J. Math., vol. 5, no. 1, pp. 155–165, Feb. 2023, doi: 10.34312/jjom.v5i1.17098.

F. Badri and S. U. R. Sari, “Penerapan Metode Principal Component Analysis (PCA) Untuk Identifikasi Faktor-Faktor yang Mempengaruhi Sikap Mahasiswa Memilih Melanjutkan Studi ke Kota Malang,” Build. Informatics, Technol. Sci., vol. 3, no. 3, pp. 426–431, Dec. 2021, doi: 10.47065/bits.v3i3.1139.

Y. N. Dewi, H. Rianto, D. Riana, and J. Siregar, “Integrasi Metode Sample Bootstrapping Dan Weighted Principal Component Analisys (PCA) Untuk Meningkatkan Performa Naïve Bayes Pada Citra Tunggal Papsmear,” Inti Nusa Mandiri, vol. 14, no. 2, pp. 133–138, 2020, doi: https://doi.org/10.33480/inti.v14i2.1103 VOL.

B. Hirwono and Suhirman, “Classification of Indonesian Smart Card Scholarship Recipients with Principal Component Analysis using the Naive Bayes and Decision Tree Methods Case Study: Stie Pariwisata API Yogyakarta,” Int. J. Comput. Appl., vol. 186, no. 5, pp. 13–21, 2024, doi: 10.5120/ijca2024923375.

Downloads

Published

2025-02-15

How to Cite

Lestari, A. A., Ahmad Faqih, & Gifthera Dwilestari. (2025). Improving Sentiment Analysis Performance of Tokopedia Reviews Using Principal Component Analysis and Naïve Bayes Algorithm. Journal of Artificial Intelligence and Engineering Applications (JAIEA), 4(2), 758–763. https://doi.org/10.59934/jaiea.v4i2.743