Implementation of Isolation Forest-Based Machine Learning in Batch Anomaly Detection on Zeek Log Data (Case Study: Langkat Regency Communication and Information Agency)
DOI:
https://doi.org/10.59934/jaiea.v5i1.1547Keywords:
Anomaly Detection, Cybersecurity, Isolation Forest, Network Logs, ZeekAbstract
The high escalation of cyber threats against government institutions requires an adaptive and intelligent digital security system. The Langkat Regency Communication and Information Agency faces challenges in analyzing large volumes of network log data to effectively detect suspicious activity. This study aims to implement the Isolation Forest machine learning algorithm to detect anomalies in batches on Zeek log data, and classify detected anomalies into threat levels to facilitate security audits. Using the CRISP-DM framework, this study analyzed 12.1 million lines of Zeek conn.log data from December 2024 through the stages of data preparation, unsupervised modeling with Isolation Forest, and manual threshold determination for classification. The effectiveness of the model is evaluated using Precision, Recall, and F1-Score metrics against proxy labels, and the results are enriched with rule-based labeling to determine threat levels. The results of the study show that the model successfully identified 15.34% of connections as anomalies, with the dominant pattern categorized as a “High” threat detected in DNS and unknown services, indicating potential malicious activity. Quantitative evaluation yielded a precision of 0.41 and a recall of 0.08, highlighting the model's ability to detect more subtle anomalies beyond simple rules. Thus, the implementation of Isolation Forest proved effective in identifying diverse network anomaly patterns, where its combination with rule-based labeling provides functional threat context for cybersecurity teams.
Downloads
References
Airlangga, G. (2023). UNSUPERVISED MACHINE LEARNING FOR SEISMIC ANOMALY DETECTION: ISOLATION FOREST ALGORITHM APPLICATION TO INDONESIAN EARTHQUAKE DATA. 4(3), 1827–1836. https://doi.org/10.46306/lb.v4i3
Akoh Atadoga, Enoch Oluwademilade Sodiya, Uchenna Joseph Umoga, & Olukunle Oladipupo Amoo. (2024). A comprehensive review of machine learning’s role in enhancing network security and threat detection. World Journal of Advanced Research and Reviews, 21(2), 877–886. https://doi.org/10.30574/wjarr.2024.21.2.0501
Chua, W., Pajas, A. L. D., Castro, C. S., Panganiban, S. P., Pasuquin, A. J., Purganan, M. J., Malupeng, R., Pingad, D. J., Orolfo, J. P., Lua, H. H., & Velasco, L. C. (2024). Web Traffic Anomaly Detection Using Isolation Forest. Informatics, 11(4). https://doi.org/10.3390/informatics11040083
Djidjev, C. (2024). siForest: Detecting Network Anomalies with Set-Structured Isolation Forest. http://arxiv.org/abs/2412.06015
Liu, F. T., Ting, K. M., & Zhou, Z. H. (2012). Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data, 6(1). https://doi.org/10.1145/2133360.2133363
Moomtaheen, F., Bagui, S. S., Bagui, S. C., & Mink, D. (2024). Extended Isolation Forest for Intrusion Detection in Zeek Data. Information (Switzerland), 15(7). https://doi.org/10.3390/info15070404
Ripan, R. C., Sarker, I. H., Anwar, M. M., Furhad, Md. H., Rahat, F., Hoque, M. M., & Sarfraz, M. (2020). An Isolation Forest Learning Based Outlier Detection Approach for Effectively Classifying Cyber Anomalies. http://arxiv.org/abs/2101.03141
Schröer, C., Kruse, F., & Gómez, J. M. (2021). A systematic literature review on applying CRISP-DM process model. Procedia Computer Science, 181, 526–534. https://doi.org/10.1016/j.procs.2021.01.199
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Journal of Artificial Intelligence and Engineering Applications (JAIEA)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.







