Trends, Challenges, And Future Directions of Semantic Segmentation Based on Deep Learning
DOI:
https://doi.org/10.59934/jaiea.v5i3.2485Keywords:
Computer Vision, Deep Learning, PRISMA, Semantic Segmentation, Systematic Literature ReviewAbstract
Semantic segmentation is a fundamental task in computer vision that classifies each pixel in an image into a specific category. Advances in deep learning have significantly improved semantic segmentation performance across various applications, including medical imaging, remote sensing, autonomous driving, and industrial inspection. This study aims to analyze the development of methods, architectures, challenges, and future research directions in deep learning-based semantic segmentation. A Systematic Literature Review (SLR) was conducted using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework. Literature was collected from the SCOPUS database using keywords related to deep learning-based semantic segmentation. A total of 5,867 publications were identified, and 30 studies were selected after applying predefined inclusion and exclusion criteria. The review found that Convolutional Neural Networks (CNNs), Vision Transformers, and hybrid architectures are the dominant approaches. Attention mechanisms and multi-scale feature extraction were also identified as effective techniques for improving segmentation performance. Despite these advancements, challenges such as class imbalance, small object segmentation, and the need for large annotated datasets remain unresolved. The findings provide a comprehensive overview of current trends and highlight potential directions for future research in semantic segmentation.
Downloads
References
Nekamiche N, Kajo I, Kas M, Ruichek Y. DiffuSaL: diffusion-based scene and label space extension for domain generalization in driving-scene semantic segmentation. IEEE Access. 2026;14:355–374. doi:10.1109/ACCESS.2025.3646942.
Li N, Zhang A, Han H, Duan Y. RCPU-Net: a multi-scale multi-object segmentation model for coal and gangue under uneven lighting based on improved U-Net. Digit Signal Process. 2026;168:105484. doi:10.1016/j.dsp.2025.105484.
Wu J, Xian R, Gao P. A novel insulator defects segmentation network integrating dual-branch dynamic snake convolution module and decoupled-selective feature pyramid network. Appl Sci. 2026;16[4]. doi:10.3390/app16041941.
Wang F, Wang P, Zhao M, Shan C, Yang Z. The power of modality: improving polyp segmentation with multimodal information. IET Image Process. 2026;20[1]. doi:10.1049/ipr2.70305.
Wang S, Tavares A, Lima C, Gomes T, Zhang Y, Zhao J, et al. LAViTSPose: a lightweight cascaded framework for robust sitting posture recognition via detection-segmentation-classification. Entropy. 2025;27[12]. doi:10.3390/e27121196.
Zhou K, Liu C, Zhang Y, Miao R, Fu S, Wang W. Enhanced building extraction via STMC-UNet: integrating Swin Transformer and multi-scale convolution for high-resolution remote sensing images. Signal Image Video Process. 2025;19[12]. doi:10.1007/s11760-025-04575-w.
Zou J, Xie X, Tang G. A method for detecting construction deviations using semantic segmentation and point cloud registration. Comput Aided Civ Infrastruct Eng. 2025;40(31):6597–6621. doi:10.1111/mice.70171.
He J, Dai C, Liu Y. A lightweight semantic segmentation method for crop mapping based on UAV imagery. In: Proceedings of the 2025 4th International Conference on Artificial Intelligence and Machine Learning; 2025. p. 121–125. doi:10.1145/3778534.3778555.
Szczepanski M, Poreba M, Haroun K. Where do tokens go? Understanding pruning behavior in transformer-based semantic segmentation. SN Comput Sci. 2026;7[2]. doi:10.1007/s42979-025-04707-6.
Liu Z, Zhang Y, He X, Zhang D, Ai S. Towards sustainable historic waterfront streetscape management using semantic segmentation. Sustainability. 2026;18[2]. doi:10.3390/su18021099.
Banfi F, Liu W. The state of HBIM in digital heritage: a critical review based on semantic segmentation approaches. Appl Sci. 2026;16[2]. doi:10.3390/app16020906.
Sun T, Yang Z, Chen Y, Li Y. Road segmentation of remote sensing images based on deep semantic segmentation. In: Proceedings of the 2025 9th International Conference on Computer Science and Artificial Intelligence; 2025. p. 148–152. doi:10.1145/3766671.3766698.
Fu S, Cai Q, Li Z, Wang W, Lin T, Chen Q, et al. Semantic segmentation and effect optimization for agricultural applications using deep learning. Sensors. 2026;26[4]. doi:10.3390/s26041257.
Khedgaonkar R, Sonsare P, Singh K, Altameem A, Alazab M. Effective deep learning models for semantic segmentation: a comparative study. Comput Mater Continua. 2026;87[1]. doi:10.32604/cmc.2025.072651.
Jiang Y, Shariftabrizi A, Manem VSK. Lung-DDPM+: efficient thoracic CT image synthesis using diffusion probabilistic model. Comput Biol Med. 2025;199. doi:10.1016/j.compbiomed.2025.111290.
Emelyanov A, Knyaz V, Kniaz V, Borisov N, Aleksandrov V. Advancing building footprint extraction with multi-stage regularization techniques. Int Arch Photogramm Remote Sens Spatial Inf Sci. 2026;48(4/W18-2025):125–130. doi:10.5194/isprs-archives-XLVIII-4-W18-2025-125-2026.
Docherty R, Vamvakeros A, Cooper SJ. Upsampling DINOv2 features for unsupervised vision tasks and weakly supervised machine learning. Adv Intell Syst. 2026. doi:10.1002/aisy.202501094.
Jeon J, Kim S, Kim T. Improving agricultural land use detection through YOLO-SAM fusion framework. Inf Process Agric. 2026. doi:10.1016/j.inpa.2025.12.008.
Jiang J, He Z, Wan A, Al-Bukhaiti K, Wang K, Zhu P, et al. Zero-shot industrial anomaly detection via CLIP-DINOv2 multimodal fusion and state space model. Electronics. 2025;14[24]. doi:10.3390/electronics14244785.
Hoang DC, Tan PX, Nguyen AN, Ngo DH, Cao MD, Vu MQ, et al. MambaAlign: alignment-aware state-space fusion for RGB-X industrial anomaly detection. J Comput Des Eng. 2026. doi:10.1093/jcde/qwaf143.
Liu Y, Bian C, Nie H, Chen S, Yang Z. A large-scale synthetic benchmark dataset for non-cooperative space target perception. Sci Data. 2025. doi:10.1038/s41597-025-06056-8.
Xie Z, Liang M. Few-shot fine-tuning for rockery element semantic segmentation with Rhino integration. In: Proceedings of the 2025 3rd International Conference on Internet of Things and Cloud Computing Technology (IoTCCT 2025); 2025. doi:10.1145/3776865.3776915.
Widyaningsih M, Priyambodo TK, Wibowo ME, Kamal M. Multi-class semantic segmentation of oil palm areas using a VGG-19 U-Net improved architecture. J RESTI. 2026;10[1]. doi:10.29207/resti.v10i1.7062.
Zhu Q, Jiang Y, Fan L. Classwise-CRF: category-specific fusion for enhanced semantic segmentation of remote sensing images. Neural Netw. 2026. doi:10.1016/j.neunet.2025.108485.
Stanescu L, Stoica Spahiu C. Clinically interpretable nuclei segmentation for robust histopathological image analysis. Appl Sci. 2026;16[3]. doi:10.3390/app16031509.
Yıldırım A, Terzi R. GenYOLO-Leaf: a data-centric and open source framework for generalizable leaf instance segmentation. Vis Comput. 2026. doi:10.1007/s00371-025-04351-4.
Paek S, Cho W, Jeong J, Park JI. Patch-wise analysis and reconstruction of large-scale floorplan for digital twin applications. IEEE Access. 2025. doi:10.1109/ACCESS.2025.3642914.
Wang H, Hu K, Guo X, Li H, Tao C. A gift from the integration of discriminative and diffusion-based generative learning for semantic segmentation. IEEE Trans Pattern Anal Mach Intell. 2026. doi:10.1109/TPAMI.2026.3654243.
Xie H, Wang M, An L, Wang Y, Ge R, Gong X. BiNeXt-SMSMVL: a structure-aware multi-scale multi-view learning network for robust semantic segmentation. Electronics. 2025;14[23]. doi:10.3390/electronics14234564.
Haddaway NR, Page MJ, Pritchard CC, McGuinness LA. PRISMA2020: an R package and Shiny app for producing PRISMA 2020-compliant flow diagrams, with interactivity for optimised digital transparency and Open Synthesis. Campbell Syst Rev. 2022;18:e1230. doi:10.1002/cl2.1230.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Journal of Artificial Intelligence and Engineering Applications (JAIEA)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.








