Whisper Model and Lexrank Algorithm as an Efficient Solution for YouTube Video Transcription and Summarization

Authors

  • Labid Muwaffaq Wicaksana Universitas Muhammadiyah Lamongan
  • Mufti Ari Bianto Universitas Muhammadiyah Lamongan
  • Ian Pasha Universitas Muhammadiyah Lamongan
  • Adam Azzamul Husni Huda Universitas Muhammadiyah Lamongan

DOI:

https://doi.org/10.59934/jaiea.v5i1.1259

Keywords:

YouTube, Whisper, Transcription, automation

Abstract

This study developed an automated system for transcribing and summarizing YouTube video content to help users efficiently access important information from long-duration videos. The system only requires a video link as input, then automatically extracts the audio, transcribes it using the Whisper Small model, and generates a summary using the LexRank algorithm, which selects key sentences based on graph centrality. Transcription quality is evaluated using the Word Error Rate (WER) metric, with an average score of 0.3703 or approximately 37%, indicating a fairly good level of accuracy. Meanwhile, the summarization evaluation using ROUGE metrics yielded average F1-Scores of 33% for ROUGE-1, 10% for ROUGE-2, and 19% for ROUGE-L, reflecting the relevance of the generated summaries to manual references. The average transcription processing time is around 0.17 seconds per word, while the summarization process takes less than 1 second. All results—including transcriptions, summaries, and evaluation metrics—are automatically saved in CSV format. This system demonstrates stable performance and holds strong potential for various video-based knowledge extraction applications, such as in education, journalism, research, and digital documentation.

Downloads

Download data is not yet available.

References

M. A. M. Ardiansyah and M. L. Nugraha, “Analisis pemanfaatan media pembelajaran YouTube dalam meningkatkan pemahaman konsep matematika peserta didik,” in Proc. Seminar Nasional Riset dan Inovasi Teknologi (SEMNAS RISTEK), vol. 6, no. 1, Jan. 2022. doi: 10.30998/semnasristek.v6i1.5828.

D. Ramadhina and I. Rohman, “Problematika guru dalam penggunaan video YouTube sebagai media pembelajaran di sekolah dasar,” Mimbar Ilmu, vol. 27, no. 1, pp. 117–123, 2022. doi: 10.23887/mi.v27i1.45598.

D. Wong, “Effectiveness of learning through video clips and video learning improvements between business related postgraduate and undergraduate students,” Int. J. Mod. Educ., vol. 2, no. 7, pp. 119–127, 2020, doi: 10.35631/IJMOE.27009.

H. B. U. Haq, M. Asif, and M. Bin Ahmad, “Video summarization techniques: a review,” Int. J. Sci. Technol. Res., vol. 9, no. 11, pp. 146–153, 2020.

M. Fadlilah, A. Atmadja, and M. Firdaus, “Pemanfaatan Transformer untuk peringkasan teks: Studi kasus pada transkripsi video pembelajaran,” Building of Informatics, Technology and Science (BITS), vol. 6, no. 3, pp. 2111–2119, 2024, doi: 10.47065/bits.v6i3.6342.

R. F. Khoiroh, E. Julianto, S. A. Ardiyansa, H. A. Fajri, A. A. R. Yasa, and B. Sangapta, “Implementasi speech recognition Whisper pada debat calon wakil presiden Republik Indonesia,” Explore, vol. 14, no. 2, pp. 67–74, 2024, doi: 10.35200/ex.v14i2.115.

Wiratmoko, G. (2025). Evaluating the Effectiveness of the LexRank and LSA Algorithm in Automatic Text Summarization for Indonesian Language. Eduvest - Journal of Universal Studies, Vol. 5 No. 2, 3407–3415. DOI: 10.59188/eduvest.v5i2.1663

Mustansiriyah University. (2021). An Approach for Multi-Document Text Summarization Using Extreme Learning Machine and LexRank. International Journal of Engineering Research and Advanced Technology, 7(5), 19–28. DOI: 10.31695/IJERAT.2021.3704.

A. Rawat, V. Rawat, N. Singh, N. Kuchhal, J. Barmola, and H. S. Negi, “An enhance version of YouTube video downloader using Python,” in Proc. 2023 International Conference on Computer Science and Emerging Technologies (CSET), Oct. 2023, pp. 1–6. doi: 10.1109/CSET58993.2023.10346693.

D. Ferdiansyah and C. S. K. Aditya, “Implementasi automatic speech recognition bacaan Al-Qur’an menggunakan metode Wav2Vec 2.0 dan OpenAI-Whisper,” in Proc. Jurnal Teknik Elektro dan Komputer TRIAC, vol. 11, no. 1, pp. 11–16, 2024. doi: 10.21107/triac.v11i1.24332.

Z. Dyarbirru and S. Hidayat, “Metode Wavelet-MFCC dan Korelasi dalam Pengenalan Suara Digit”, jtim, vol. 2, no. 2, pp. 100–108, Aug. 2020. DOI: 10.35746/jtim.v2i2.99

M. Barbella and G. Tortora, “Rouge metric evaluation for text summarization techniques,” in Proc. SSRN Electronic Journal, 2022. doi: 10.2139/ssrn.4120317.

G. Hartawan, D. S. Maylawati, and W. Uriawan, “Bidirectional and Auto-Regressive Transformer (BART) for Indonesian Abstractive Text Sum-marization,” in Jurnal Informatika Polinema, vol. 10, no. 4, pp. 535–542, 2024. doi: 10.33795/jip.v10i4.5242

Y. Yuliska and K. U. Syaliman, “Literatur review terhadap metode, aplikasi dan dataset peringkasan dokumen teks otomatis untuk teks berbahasa Indonesia,” in Proc. IT J. Res. Dev., vol. 5, no. 1, pp. 19–31, 2020. doi: 10.25299/itjrd.2020.vol5(1).4688.

Downloads

Published

2025-10-15

How to Cite

Labid Muwaffaq Wicaksana, Mufti Ari Bianto, Ian Pasha, & Adam Azzamul Husni Huda. (2025). Whisper Model and Lexrank Algorithm as an Efficient Solution for YouTube Video Transcription and Summarization. Journal of Artificial Intelligence and Engineering Applications (JAIEA), 5(1), 128–134. https://doi.org/10.59934/jaiea.v5i1.1259