Home > Research > Publications & Outputs > Class-guided Swin Transformer for Semantic Segm...

Electronic data

  • GRSL-01132-2022

    Rights statement: ©2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

    Accepted author manuscript, 2.91 MB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

Text available via DOI:

View graph of relations

Class-guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Class-guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery. / Meng, Xiaoliang; Yang, Yuechi; Wang, Libo et al.
In: IEEE Geoscience and Remote Sensing Letters, Vol. 19, No. 10, 6517505, 17.10.2022.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Meng, X, Yang, Y, Wang, L, Wang, T, Li, R & Zhang, C 2022, 'Class-guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery', IEEE Geoscience and Remote Sensing Letters, vol. 19, no. 10, 6517505. https://doi.org/10.1109/LGRS.2022.3215200

APA

Meng, X., Yang, Y., Wang, L., Wang, T., Li, R., & Zhang, C. (2022). Class-guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery. IEEE Geoscience and Remote Sensing Letters, 19(10), Article 6517505. https://doi.org/10.1109/LGRS.2022.3215200

Vancouver

Meng X, Yang Y, Wang L, Wang T, Li R, Zhang C. Class-guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery. IEEE Geoscience and Remote Sensing Letters. 2022 Oct 17;19(10):6517505. doi: 10.1109/LGRS.2022.3215200

Author

Meng, Xiaoliang ; Yang, Yuechi ; Wang, Libo et al. / Class-guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery. In: IEEE Geoscience and Remote Sensing Letters. 2022 ; Vol. 19, No. 10.

Bibtex

@article{04b5817d6eb14432b919c4c3438e1abb,
title = "Class-guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery",
abstract = "Semantic segmentation of remote sensing images plays a crucial role in a wide variety of practical applications, including land cover mapping, environmental protection, and economic assessment. In the last decade, convolutional neural network (CNN) is the mainstream deep learning-based method of semantic segmentation. Compared with conventional methods, CNN-based methods learn semantic features automatically, thereby achieving strong representation capability. However, the local receptive field of the convolution operation limits CNN-based methods from capturing long-range dependencies. In contrast, Vision Transformer (ViT) demonstrates its great potential in modeling long-range dependencies and obtains superior results in semantic segmentation. Inspired by this, in this letter, we propose a class-guided Swin Transformer (CG-Swin) for semantic segmentation of remote sensing images. Specifically, we adopt a Transformer-based encoder-decoder structure, which introduces the Swin Transformer backbone as the encoder and designs a class-guided Transformer block to construct the decoder. The experimental results on ISPRS Vaihingen and Potsdam datasets demonstrate the significant breakthrough of the proposed method over ten benchmarks, outperforming both advanced CNN-based and recent Transformer-based approaches.",
keywords = "Fully Transformer network, class-guided mechanism, semantic segmentation, remote sensing",
author = "Xiaoliang Meng and Yuechi Yang and Libo Wang and Teng Wang and Rui Li and Ce Zhang",
note = "{\textcopyright}2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. ",
year = "2022",
month = oct,
day = "17",
doi = "10.1109/LGRS.2022.3215200",
language = "English",
volume = "19",
journal = "IEEE Geoscience and Remote Sensing Letters",
issn = "1545-598X",
publisher = "IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC",
number = "10",

}

RIS

TY - JOUR

T1 - Class-guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery

AU - Meng, Xiaoliang

AU - Yang, Yuechi

AU - Wang, Libo

AU - Wang, Teng

AU - Li, Rui

AU - Zhang, Ce

N1 - ©2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

PY - 2022/10/17

Y1 - 2022/10/17

N2 - Semantic segmentation of remote sensing images plays a crucial role in a wide variety of practical applications, including land cover mapping, environmental protection, and economic assessment. In the last decade, convolutional neural network (CNN) is the mainstream deep learning-based method of semantic segmentation. Compared with conventional methods, CNN-based methods learn semantic features automatically, thereby achieving strong representation capability. However, the local receptive field of the convolution operation limits CNN-based methods from capturing long-range dependencies. In contrast, Vision Transformer (ViT) demonstrates its great potential in modeling long-range dependencies and obtains superior results in semantic segmentation. Inspired by this, in this letter, we propose a class-guided Swin Transformer (CG-Swin) for semantic segmentation of remote sensing images. Specifically, we adopt a Transformer-based encoder-decoder structure, which introduces the Swin Transformer backbone as the encoder and designs a class-guided Transformer block to construct the decoder. The experimental results on ISPRS Vaihingen and Potsdam datasets demonstrate the significant breakthrough of the proposed method over ten benchmarks, outperforming both advanced CNN-based and recent Transformer-based approaches.

AB - Semantic segmentation of remote sensing images plays a crucial role in a wide variety of practical applications, including land cover mapping, environmental protection, and economic assessment. In the last decade, convolutional neural network (CNN) is the mainstream deep learning-based method of semantic segmentation. Compared with conventional methods, CNN-based methods learn semantic features automatically, thereby achieving strong representation capability. However, the local receptive field of the convolution operation limits CNN-based methods from capturing long-range dependencies. In contrast, Vision Transformer (ViT) demonstrates its great potential in modeling long-range dependencies and obtains superior results in semantic segmentation. Inspired by this, in this letter, we propose a class-guided Swin Transformer (CG-Swin) for semantic segmentation of remote sensing images. Specifically, we adopt a Transformer-based encoder-decoder structure, which introduces the Swin Transformer backbone as the encoder and designs a class-guided Transformer block to construct the decoder. The experimental results on ISPRS Vaihingen and Potsdam datasets demonstrate the significant breakthrough of the proposed method over ten benchmarks, outperforming both advanced CNN-based and recent Transformer-based approaches.

KW - Fully Transformer network

KW - class-guided mechanism

KW - semantic segmentation

KW - remote sensing

U2 - 10.1109/LGRS.2022.3215200

DO - 10.1109/LGRS.2022.3215200

M3 - Journal article

VL - 19

JO - IEEE Geoscience and Remote Sensing Letters

JF - IEEE Geoscience and Remote Sensing Letters

SN - 1545-598X

IS - 10

M1 - 6517505

ER -