Class-guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery

Home > Research > Publications & Outputs > Class-guided Swin Transformer for Semantic Segm...

Lancaster Environment Centre

Associated organisational units

Electronic data

GRSL-01132-2022
Rights statement: ©2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Accepted author manuscript, 2.91 MB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Text available via DOI:

https://doi.org/10.1109/LGRS.2022.3215200
Final published version

Keywords

Fully Transformer network, class-guided mechanism, semantic segmentation, remote sensing

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Xiaoliang Meng
Yuechi Yang
Libo Wang
Teng Wang
Rui Li
Ce Zhang

More...

Article number	6517505
<mark>Journal publication date</mark>	17/10/2022
<mark>Journal</mark>	IEEE Geoscience and Remote Sensing Letters
Issue number	10
Volume	19
Number of pages	5
Publication Status	Published
<mark>Original language</mark>	English

Abstract

Semantic segmentation of remote sensing images plays a crucial role in a wide variety of practical applications, including land cover mapping, environmental protection, and economic assessment. In the last decade, convolutional neural network (CNN) is the mainstream deep learning-based method of semantic segmentation. Compared with conventional methods, CNN-based methods learn semantic features automatically, thereby achieving strong representation capability. However, the local receptive field of the convolution operation limits CNN-based methods from capturing long-range dependencies. In contrast, Vision Transformer (ViT) demonstrates its great potential in modeling long-range dependencies and obtains superior results in semantic segmentation. Inspired by this, in this letter, we propose a class-guided Swin Transformer (CG-Swin) for semantic segmentation of remote sensing images. Specifically, we adopt a Transformer-based encoder-decoder structure, which introduces the Swin Transformer backbone as the encoder and designs a class-guided Transformer block to construct the decoder. The experimental results on ISPRS Vaihingen and Potsdam datasets demonstrate the significant breakthrough of the proposed method over ten benchmarks, outperforming both advanced CNN-based and recent Transformer-based approaches.

Bibliographic note

©2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Research

Associated organisational units

Electronic data

Links

Text available via DOI:

Keywords