Automatic Criminal News Summarization System with Extractive Method Based on Latent

Authors

  • Christ Chandra Mahasiswa

DOI:

https://doi.org/10.59934/jaiea.v5i1.1395

Keywords:

Summarization system, TF-IDF, LSA, ROUGE

Abstract

The rapid growth of digital information demands automatic systems to help users efficiently extract the core of information, especially in criminal news which often attracts significant public attention. This study aims to design and develop an automatic summarization system for criminal news using an extractive method based on Latent Semantic Analysis (LSA). In the process, textual features are first extracted using the Term Frequency-Inverse Document Frequency (TF-IDF) method to weigh the importance of each word in the document. The resulting TF-IDF matrix is then used as input for LSA to model semantic relationships between sentences and identify those most representative of the document content. The dataset consists of Indonesian-language criminal news articles collected from various online news portals. The system is evaluated by comparing the automatically generated summaries with human-written summaries using the ROUGE metric. The experimental results show that the combination of TF-IDF and LSA can generate informative and relevant summaries, achieving a ROUGE-1 score of 0.72. This system is expected to help users understand news content quickly and efficiently.

Downloads

Download data is not yet available.

References

A. Apriansyah, H. Fithriansyah, and T. Rahadian, “Eksistensi Surat Kabar Media Indonesia di Era Digital,” Popul. J. Sos. dan Hum., vol. 8, no. 1, pp. 74–81, 2023, doi: 10.47313/pjsh.v8i1.2351.

A. Nurfahmi, D. Suherdiana, and A. H. Sumadiria, “Penggunaan Bahasa Jurnalistik pada Rubrik Lifestyle di Situs Prfmnews . id,” vol. 8, no. September 2023, pp. 245–264, 2024.

D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural Language Processing : State of The Art , Current Trends and Challenges Department of Computer Science and Engineering Accendere Knowledge Management Services Pvt . Ltd ., India Abstract,” Sentim. Anal. has become one Most profound Res. areas with increasing growth Soc. media web. Nowadays, millions users Exch. their views, ideas, expressions, Feel. Opin. Soc. media like twitter Facebook. Se, no. Figure 1, 2017.

V. Reji, “Information Extraction Using Natural Language Processing,” Interantional J. Sci. Res. Eng. Manag., vol. 06, no. 05, 2022, doi: 10.55041/ijsrem13271.

M. G. Ozsoy, F. N. Alpaslan, and I. Cicekli, “Text summarization using latent semantic analysis,” J. Inf. Sci., vol. 37, no. 4, pp. 405–417, 2011, doi: 10.1177/0165551511408848.

N. Evangelopoulos, T. Ashton, K. Winson-Geideman, and S. Roulac, “Latent semantic analysis and real estate research: Methods and applications,” J. Real Estate Lit., vol. 23, no. 2, pp. 355–380, 2015, doi: 10.1080/10835547.2015.12090411.

O. M. Foong, S. P. Yong, and F. A. Jaid, “Text Summarization Using Latent Semantic Analysis Model in Mobile Android Platform,” Proc. - AMS 2015 Asia Model. Symp. 2015 - Asia 9th Int. Conf. Math. Model. Comput. Simul., pp. 35–39, 2016, doi: 10.1109/AMS.2015.15.

D. B. P. D. BP, K. Wilis, and ..., “Summarization of Speech to Text from Reporter in Police Office with Latent Semantic Analysis (LSA) Method,” Int. J. …, vol. 13, no. 2, pp. 933–943, 2020, [Online]. Available: http://eprints.upnyk.ac.id/23091/%0Ahttp://eprints.upnyk.ac.id/23091/1/5. summarization of speech.pdf

H. Christian, M. P. Agus, and D. Suhartono, “Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF),” ComTech Comput. Math. Eng. Appl., vol. 7, no. 4, p. 285, 2016, doi: 10.21512/comtech.v7i4.3746.

W. Yulita, M. C. Untoro, M. Praseptiawan, I. F. Ashari, A. Afriansyah, and A. N. Bin Che Pee, “Automatic Scoring Using Term Frequency Inverse Document Frequency Document Frequency and Cosine Similarity,” Sci. J. Informatics, vol. 10, no. 2, pp. 93–104, 2023, doi: 10.15294/sji.v10i2.42209.

K. Al-Sabahi, Z. Zuping, and K. Yang, “Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings,” 2018, [Online]. Available: http://arxiv.org/abs/1811.06567

Z. Jiang, M. Srivastava, S. Krishna, D. Akodes, and R. Schwartz, “Combining Word Embeddings and N-grams for Unsupervised Document Summarization,” 2020, [Online]. Available: http://arxiv.org/abs/2004.14119

X. Zhang, M. Lapata, F. Wei, and M. Zhou, “Neural latent extractive document summarization,” Proc. 2018 Conf. Empir. Methods Nat. Lang. Process. EMNLP 2018, pp. 779–784, 2018, doi: 10.18653/v1/d18-1088.

K. Ganesan, “ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks,” pp. 1–8, 2018, [Online]. Available: http://arxiv.org/abs/1803.01937

R. Robiyanto, N. Nugraha, and I. Apriatna, “Peringkasan Teks Otomatis Berita Menggunakan Metode Maximum Marginal Relevance,” JEJARING J. Teknol. dan Manaj. Inform., vol. 4, no. 1, pp. 23–32, 2019, doi: 10.25134/jejaring.v4i1.6712.

Downloads

Published

2025-10-15

How to Cite

Chandra, C. (2025). Automatic Criminal News Summarization System with Extractive Method Based on Latent. Journal of Artificial Intelligence and Engineering Applications (JAIEA), 5(1), 533–538. https://doi.org/10.59934/jaiea.v5i1.1395