Semantic Segmentation: A Zoology of Deep Architectures
Aitor Artola
⚠ This is a preprint. It may change before it is accepted for publication.

Abstract

In this paper we review the evolution of deep architectures for semantic segmentation. The first successful model was fully convolutional network (FCN) published in CVPR in 2015. Since then, the subject has become very popular and many methods have been published, mainly proposing improvements of FCN. We describe in detail the Pyramid Scene Parsing Network (PSPnet) and DeepLabV3, in addition to FCN, which provide a multi-scale description and increase the resolution of segmentation. In recent years, convolutional architectures have reached a bottleneck and have been surpassed by transformers from natural language processing (NLP), even though these models are generally larger and slower. We have chosen to discuss about the Segmentation Transformer (SETR), a first architecture with a transformer backbone. We also discuss SegFormer, that includes a multi-scale interpretation and tricks to decrease the size and inference time of the network. The networks presented in the demo come from the MM-Segmentation library, an open source semantic segmentation toolbox based on PyTorch. We propose to compare these methods qualitatively on individual images, and not on global metrics on databases as is usually the case. We compare these architectures on images outside of their training set. We also invite the readers to make their own comparison and derive their own conclusions.

Download