Final published version
Licence: CC BY: Creative Commons Attribution 4.0 International License
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - DE-Unet
T2 - Dual-encoder U-Net for Ultra-high Resolution Remote Sensing Image Segmentation
AU - Liu, Y.
AU - Song, S.
AU - Wang, M.
AU - Gao, H.
AU - Liu, J.
PY - 2025/4/30
Y1 - 2025/4/30
N2 - In recent years, there has been a growing demand for remote sensing image semantic segmentation in various applications. The key to semantic segmentation lies in the ability to globally comprehend the input image. While recent transformer-based methods can effectively capture global contextual information, they suffer from high computational complexity, particularly when it comes to ultra-high resolution (UHR) remote sensing images, it is even more challenging for these methods to achieve a satisfactory balance between accuracy and computation speed. To address these issues, we propose in this article a CNN-based dual-encoder U-Net for effective and efficient UHR image segmentation. Our method incorporates dual encoders into the symmetrical framework of U-Net. The dual encoders endow the network with strong global and local perception capabilities simultaneously, while the U-Net's symmetrical structure guarantees the network's robust decoding ability. Additionally, multipath skip connections ensure ample information exchange between the dual encoders, as well as between the encoders and decoders. Furthermore, we proposes a context-aware modulation fusion module that guides the encoder–encoder and encoder–decoder data fusion through global receptive fields. Experiments conducted on public UHR remote sensing datasets such as the Inria Aerial and DeepGlobe have demonstrated the effectiveness of proposed method. Specifically on the Inria Aerial dataset, our method achieves a 77.42% mIoU which outperforms the baseline (Guo et al., 2022) by 3.14% while maintaining comparable inference speed as shown in Fig. 1.
AB - In recent years, there has been a growing demand for remote sensing image semantic segmentation in various applications. The key to semantic segmentation lies in the ability to globally comprehend the input image. While recent transformer-based methods can effectively capture global contextual information, they suffer from high computational complexity, particularly when it comes to ultra-high resolution (UHR) remote sensing images, it is even more challenging for these methods to achieve a satisfactory balance between accuracy and computation speed. To address these issues, we propose in this article a CNN-based dual-encoder U-Net for effective and efficient UHR image segmentation. Our method incorporates dual encoders into the symmetrical framework of U-Net. The dual encoders endow the network with strong global and local perception capabilities simultaneously, while the U-Net's symmetrical structure guarantees the network's robust decoding ability. Additionally, multipath skip connections ensure ample information exchange between the dual encoders, as well as between the encoders and decoders. Furthermore, we proposes a context-aware modulation fusion module that guides the encoder–encoder and encoder–decoder data fusion through global receptive fields. Experiments conducted on public UHR remote sensing datasets such as the Inria Aerial and DeepGlobe have demonstrated the effectiveness of proposed method. Specifically on the Inria Aerial dataset, our method achieves a 77.42% mIoU which outperforms the baseline (Guo et al., 2022) by 3.14% while maintaining comparable inference speed as shown in Fig. 1.
U2 - 10.1109/JSTARS.2025.3565753
DO - 10.1109/JSTARS.2025.3565753
M3 - Journal article
VL - 18
SP - 12290
EP - 12302
JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
SN - 1939-1404
ER -