Rights statement: ©2021 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Accepted author manuscript, 3.79 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License
Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Multiattention Network for Semantic Segmentation of Fine-Resolution Remote Sensing Images
AU - Li, Rui
AU - Zheng, Shunyi
AU - Zhang, Ce
AU - Duan, Chenxi
AU - Su, Jianlin
AU - Atkinson, Peter
N1 - ©2021 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
PY - 2022/1/31
Y1 - 2022/1/31
N2 - Semantic segmentation of remote sensing images plays an important role in land resource management, yield estimation, and economic assessment. Although the accuracy of semantic segmentation in remote sensing images has been increased significantly by deep convolutional neural networks, there are still several limitations contained in standard models. First, for encoder-decoder architectures such as U-Net, the utilization of multi-scale features causes the overuse of information, where similar low-level features are exploited at multiple scales over multiple times. Second, long-range dependencies of feature maps are not sufficiently explored, resulting in feature representations associated with each semantic class not being optimized. Third, even though the dot-product attention mechanism has been introduced and utilized in semantic segmentation to model long-range dependencies, the high time and space complexities of attention impede the actual usage of attention in application scenarios with large-scale input. This paper proposed a Multi-Attention-Network (MANet) to handle these issues by extracting contextual dependencies through multiple efficient attention modules. A novel attention mechanism of kernel attention with linear complexity is proposed to alleviate the large computational demand in attention. We integrate local feature maps extracted by ResNeXt-101 with their corresponding global dependencies and reweight interdependent channel maps adaptively based on kernel attention and channel attention. Numerical experiments on three large-scale fine resolution remote sensing images captured by variant satellites demonstrate that the performance of the proposed MANet outperforms the DeepLab V3+, PSPNet, FastFCN, and other benchmark approaches.
AB - Semantic segmentation of remote sensing images plays an important role in land resource management, yield estimation, and economic assessment. Although the accuracy of semantic segmentation in remote sensing images has been increased significantly by deep convolutional neural networks, there are still several limitations contained in standard models. First, for encoder-decoder architectures such as U-Net, the utilization of multi-scale features causes the overuse of information, where similar low-level features are exploited at multiple scales over multiple times. Second, long-range dependencies of feature maps are not sufficiently explored, resulting in feature representations associated with each semantic class not being optimized. Third, even though the dot-product attention mechanism has been introduced and utilized in semantic segmentation to model long-range dependencies, the high time and space complexities of attention impede the actual usage of attention in application scenarios with large-scale input. This paper proposed a Multi-Attention-Network (MANet) to handle these issues by extracting contextual dependencies through multiple efficient attention modules. A novel attention mechanism of kernel attention with linear complexity is proposed to alleviate the large computational demand in attention. We integrate local feature maps extracted by ResNeXt-101 with their corresponding global dependencies and reweight interdependent channel maps adaptively based on kernel attention and channel attention. Numerical experiments on three large-scale fine resolution remote sensing images captured by variant satellites demonstrate that the performance of the proposed MANet outperforms the DeepLab V3+, PSPNet, FastFCN, and other benchmark approaches.
KW - fine-resolution remote sensing images
KW - attention mechanism
KW - semantic segmentation
U2 - 10.1109/TGRS.2021.3093977
DO - 10.1109/TGRS.2021.3093977
M3 - Journal article
VL - 60
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
SN - 0196-2892
M1 - 5607713
ER -