UNetFormer - Research Portal | Lancaster University

Home > Research > Publications & Outputs > UNetFormer

Lancaster Environment Centre

Associated organisational units

Electronic data

UNetFormer_accepted
Rights statement: This is the author’s version of a work that was accepted for publication in ISPRS Journal of Photogrammetry and Remote Sensing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in ISPRS Journal of Photogrammetry and Remote Sensing, 190, 2022 DOI: 10.1016/j.isprsjprs.2022.06.008
Accepted author manuscript, 8.64 MB, PDF document
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Text available via DOI:

https://doi.org/10.1016/j.isprsjprs.2022.06.008
Final published version

Keywords

Semantic Segmentation, Remote Sensing, Vision Transformer, Fully Transformer Network, Global-local Context, Urban Scene

View graph of relations

UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Libo Wang
Rui Li
Ce Zhang
Shenghui Fang
Chenxi Duan
Xiaoliang Meng
Peter Atkinson

More...

<mark>Journal publication date</mark>	31/08/2022
<mark>Journal</mark>	ISPRS Journal of Photogrammetry and Remote Sensing
Volume	190
Number of pages	19
Pages (from-to)	196-214
Publication Status	Published
Early online date	24/06/22
<mark>Original language</mark>	English

Abstract

Semantic segmentation of remotely sensed urban scene images is required in a wide range of practical applications, such as land cover mapping, urban change detection, environmental protection, and economic assessment. Driven by rapid developments in deep learning technologies, the convolutional neural network (CNN) has dominated semantic segmentation for many years. CNN adopts hierarchical feature representation, demonstrating strong capabilities for information extraction. However, the local property of the convolution layer limits the network from capturing the global context. Recently, as a hot topic in the domain of computer vision, Transformer has demonstrated its great potential in global information modelling, boosting many vision-related tasks such as image classification, object detection, and particularly semantic segmentation. In this paper, we propose a Transformer-based decoder and construct an UNet-like Transformer (UNetFormer) for real-time urban scene segmentation. For efficient segmentation, the UNetFormer selects the lightweight ResNet18 as the encoder and develops an efficient global–local attention mechanism to model both global and local information in the decoder. Extensive experiments reveal that our method not only runs faster but also produces higher accuracy compared with state-of-the-art lightweight models. Specifically, the proposed UNetFormer achieved 67.8% and 52.4% mIoU on the UAVid and LoveDA datasets, respectively, while the inference speed can achieve up to 322.4 FPS with a 512 × 512 input on a single NVIDIA GTX 3090 GPU. In further exploration, the proposed Transformer-based decoder combined with a Swin Transformer encoder also achieves the state-of-the-art result (91.3% F1 and 84.1% mIoU) on the Vaihingen dataset. The source code will be freely available at https://github.com/WangLibo1995/GeoSeg.

Bibliographic note

This is the author’s version of a work that was accepted for publication in ISPRS Journal of Photogrammetry and Remote Sensing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in ISPRS Journal of Photogrammetry and Remote Sensing, 190, 2022 DOI: 10.1016/j.isprsjprs.2022.06.008

Research

Associated organisational units

Electronic data

Links

Text available via DOI:

Keywords

UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery

Abstract

Bibliographic note

Quick Links

Connect With Us

Faculties & Depts

Contact Us