Regression-Based Prediction of Benzene Concentration Using PT08.S1 and PT08.S2 Gas Sensors
DOI:
https://doi.org/10.53842/juki.v8i1.2414Keywords:
Benzene Prediction, Gas Sensor, Linear Regression, Random Forest, Air QualityAbstract
Air pollution, particularly benzene (C6H6), is a serious urban environmental issue with significant public health impacts. Benzene is a carcinogenic compound originating from motor vehicle emissions and industrial processes. This study aims to develop a prediction model for benzene concentration using PT08.S1 (CO) and PT08.S2 (NMHC) gas sensor data along with meteorological factors (temperature, relative humidity, absolute humidity). Data was obtained from the UCI Machine Learning Repository, totaling 9,357 samples collected from five metal oxide sensors in an urban area. Preprocessing was performed by removing -200 values representing missing data, resulting in 8,779 valid samples. The methods employed are Multiple Linear Regression and Random Forest Regressor. Evaluation results show that Random Forest outperforms with MAE of 0.0155, RMSE of 0.1311, and R² of 0.9997, while Linear Regression yields MAE of 0.9966, RMSE of 1.3864, and R² of 0.9666. Feature importance analysis reveals that absolute humidity (AH) is the most dominant predictor with a weight of 0.9049, followed by PT08.S2(NMHC) with 0.0276. This study demonstrates that gas sensor data can be reliably used for benzene estimation and Random Forest is more accurate than linear regression due to its ability to capture non-linear relationships among variables.
Downloads
References
World Health Organization, WHO global air quality guidelines: Particulate matter (PM2.5 and PM10), ozone, nitrogen dioxide, sulfur dioxide and carbon monoxide. Geneva, Switzerland: WHO, 2021. Tersedia: https://www.who.int/publications/i/item/9789240034228
A. Ansari and A. R. Quaff, “Bibliometric analysis of Indian research trends in air quality forecasting research using machine learning from 2007–2023 using Scopus database,” Environmental Research and Technology, vol. 7, no. 3, pp. 356–377, 2024, doi: 10.35208/ert.1456789.
N. J. Aquilina, J. M. Delgado Saborit, S. Bugelli, J. Padovani Ginies, and R. Harrison, “Comparison of machine learning approaches with a general linear model to predict personal exposure to benzene,” Environmental Research, vol. 238, pp. 117–126, 2024, doi: 10.1016/j.envres.2023.117126.
U. Dayan, J. Koch, and S. Agami, “Atmospheric conditions leading to buildup of benzene concentrations in urban areas in Israel,” Atmospheric Environment, vol. 300, p. 119678, May 2023, doi: 10.1016/j.atmosenv.2023.119678.
Y. Romero, R. M. A. Velásquez, and J. Noel, “Development of a multiple regression model to calibrate a low-cost sensor considering reference measurements and meteorological parameters,” Environmental Monitoring and Assessment, vol. 192, no. 8, p. 498, Aug. 2020, doi: 10.1007/s10661-020-08456-8.
C. Banciu, A. Florea, and R. Bogdan, “Monitoring and predicting air quality with IoT devices,” Processes, vol. 12, no. 9, p. 1961, Sep. 2024, doi: 10.3390/pr12091961.
M. Rahmani et al., “Calibration of low-cost NO₂ sensors using machine learning,” Environmental Science and Pollution Research, vol. 31, pp. 51760–51773, 2024, doi: 10.1007/s11356-024-33940-6.
M. O. Fitri, M. Hamzah, and A. F. Rochim, “Emerging trends in statistical, machine learning, and deep learning models for air quality prediction: A bibliometric analysis,” in Proc. International Conference on Converging Technology in Electrical and Information Engineering (ICCTEIE), 2025, pp. 94–99, doi: 10.1109/ICCTEIE12345.2025.1234567.
Y. Özüpak, F. Alpsalaz, and E. Aslan, “Air quality forecasting using machine learning: Comparative analysis and ensemble strategies for enhanced prediction,” Water, Air, & Soil Pollution, vol. 236, no. 7, p. 464, Jul. 2025, doi: 10.1007/s11270-025-07817-0.
C.-Y. Wang, L.-H. Young, B.-T. Chen, B.-F. Hwang, and C.-R. Jung, “Development of daily 1 km resolution estimation models for outdoor BTEX using random forest with land-use data and meteorological variables,” Journal of Hazardous Materials, vol. 489, p. 137599, Jun. 2025, doi: 10.1016/j.jhazmat.2025.137599.
J. Vachon et al., “Do machine learning methods improve prediction of ambient air pollutants with high spatial contrast? A systematic review,” Environmental Research, vol. 262, p. 119751, Dec. 2024, doi: 10.1016/j.envres.2024.119751.
L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.
Z. Du et al., “Prediction of benzene and other VOCs using machine learning,” Atmospheric Environment, vol. 321, p. 121054, 2025, doi: 10.1016/j.atmosenv.2024.121054.
M. N. Fadhil, S. K. Gharghan, and T. R. Saeed, “Air quality prediction using random forest regression,” Environmental Monitoring and Assessment, vol. 195, p. 1145, 2023, doi: 10.1007/s10661-023-11956-2.
M. J. Fadhil, S. K. Gharghan, and T. R. Saeed, “Air pollution forecasting based on wireless communications: review,” Environmental Monitoring and Assessment, vol. 195, no. 10, Oct. 2023, doi: 10.1007/s10661-023-11878-z.
F. T. Bahadur, S. R. Shah, and R. R. Nidamanuri, “Applications of remote sensing vis-à-vis machine learning in air quality monitoring and modelling: a review,” Environmental Monitoring and Assessment, vol. 195, no. 12, 2023, doi: 10.1007/s10661-023-12122-8.
A. S. et al., “Machine learning–based calibration and performance evaluation of low-cost Internet of Things air quality sensors,” Sensors, vol. 25, no. 10, p. 3183, May 2025, doi: 10.3390/s25103183.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Setyo Hartono, Ida Ernawati, Lawrence Supriyono

This work is licensed under a Creative Commons Attribution 4.0 International License.






