Home > Research > Publications & Outputs > CPAL

Associated organisational unit

Electronic data

  • paper (1)

    Accepted author manuscript, 3.54 MB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

Text available via DOI:

View graph of relations

CPAL: Cross-prompting Adapter with LoRAs for RGB+X Semantic Segmentation

Research output: Contribution to Journal/MagazineJournal articlepeer-review

E-pub ahead of print

Standard

CPAL: Cross-prompting Adapter with LoRAs for RGB+X Semantic Segmentation. / Liu, Ye; Wu, Pengfei; Wang, Miaohui et al.
In: IEEE Transactions on Circuits and Systems for Video Technology, 29.01.2025.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Liu, Y, Wu, P, Wang, M & Liu, J 2025, 'CPAL: Cross-prompting Adapter with LoRAs for RGB+X Semantic Segmentation', IEEE Transactions on Circuits and Systems for Video Technology. https://doi.org/10.1109/tcsvt.2025.3536086

APA

Liu, Y., Wu, P., Wang, M., & Liu, J. (2025). CPAL: Cross-prompting Adapter with LoRAs for RGB+X Semantic Segmentation. IEEE Transactions on Circuits and Systems for Video Technology. Advance online publication. https://doi.org/10.1109/tcsvt.2025.3536086

Vancouver

Liu Y, Wu P, Wang M, Liu J. CPAL: Cross-prompting Adapter with LoRAs for RGB+X Semantic Segmentation. IEEE Transactions on Circuits and Systems for Video Technology. 2025 Jan 29. Epub 2025 Jan 29. doi: 10.1109/tcsvt.2025.3536086

Author

Liu, Ye ; Wu, Pengfei ; Wang, Miaohui et al. / CPAL : Cross-prompting Adapter with LoRAs for RGB+X Semantic Segmentation. In: IEEE Transactions on Circuits and Systems for Video Technology. 2025.

Bibtex

@article{3a12dc26d4044703b1b569e0f8870853,
title = "CPAL: Cross-prompting Adapter with LoRAs for RGB+X Semantic Segmentation",
abstract = "As sensor technology evolves, RGB+X systems combine traditional RGB cameras with another type of auxiliary sensor, which enhances perception capabilities and provides richer information for important tasks such as semantic segmentation. However, acquiring massive RGB+X data is difficult due to the need for specific acquisition equipment. Therefore, traditional RGB+X segmentation methods often perform pretraining on relatively abundant RGB data. However, these methods lack corresponding mechanisms to fully exploit the pretrained model, and the scope of the pretraining RGB dataset remains limited. Recent works have employed prompt learning to tap into the potential of pretrained foundation models, but these methods adopt a unidirectional prompting approach i.e., using X or RGB+X modality to prompt pretrained foundation models in RGB modality, neglecting the potential in non-RGB modalities. In this paper, we are dedicated to developing the potential of pretrained foundation models in both RGB and non-RGB modalities simultaneously, which is non-trivial due to the semantic gap between modalities. Specifically, we present the CPAL (Cross-prompting Adapter with LoRAs), a framework that features a novel bi-directional adapter to simultaneously fully exploit the complementarity and bridging the semantic gap between modalities. Additionally, CPAL introduces low-rank adaption (LoRA) to fine-tune the foundation model of each modal. With the support of these elements, we have successfully unleashed the potential of RGB foundation models in both RGB and non-RGB modalities simultaneously. Our method achieves state-of-the-art (SOTA) performance on five multi-modal benchmarks, including RGB+Depth, RGB+Thermal, RGB+Event, and a multi-modal video object segmentation benchmark, as well as four multi-modal salient object detection benchmarks. The code and results are available at: https://github.com/abelny56/CPAL.",
author = "Ye Liu and Pengfei Wu and Miaohui Wang and Jun Liu",
year = "2025",
month = jan,
day = "29",
doi = "10.1109/tcsvt.2025.3536086",
language = "English",
journal = "IEEE Transactions on Circuits and Systems for Video Technology",
issn = "1051-8215",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

RIS

TY - JOUR

T1 - CPAL

T2 - Cross-prompting Adapter with LoRAs for RGB+X Semantic Segmentation

AU - Liu, Ye

AU - Wu, Pengfei

AU - Wang, Miaohui

AU - Liu, Jun

PY - 2025/1/29

Y1 - 2025/1/29

N2 - As sensor technology evolves, RGB+X systems combine traditional RGB cameras with another type of auxiliary sensor, which enhances perception capabilities and provides richer information for important tasks such as semantic segmentation. However, acquiring massive RGB+X data is difficult due to the need for specific acquisition equipment. Therefore, traditional RGB+X segmentation methods often perform pretraining on relatively abundant RGB data. However, these methods lack corresponding mechanisms to fully exploit the pretrained model, and the scope of the pretraining RGB dataset remains limited. Recent works have employed prompt learning to tap into the potential of pretrained foundation models, but these methods adopt a unidirectional prompting approach i.e., using X or RGB+X modality to prompt pretrained foundation models in RGB modality, neglecting the potential in non-RGB modalities. In this paper, we are dedicated to developing the potential of pretrained foundation models in both RGB and non-RGB modalities simultaneously, which is non-trivial due to the semantic gap between modalities. Specifically, we present the CPAL (Cross-prompting Adapter with LoRAs), a framework that features a novel bi-directional adapter to simultaneously fully exploit the complementarity and bridging the semantic gap between modalities. Additionally, CPAL introduces low-rank adaption (LoRA) to fine-tune the foundation model of each modal. With the support of these elements, we have successfully unleashed the potential of RGB foundation models in both RGB and non-RGB modalities simultaneously. Our method achieves state-of-the-art (SOTA) performance on five multi-modal benchmarks, including RGB+Depth, RGB+Thermal, RGB+Event, and a multi-modal video object segmentation benchmark, as well as four multi-modal salient object detection benchmarks. The code and results are available at: https://github.com/abelny56/CPAL.

AB - As sensor technology evolves, RGB+X systems combine traditional RGB cameras with another type of auxiliary sensor, which enhances perception capabilities and provides richer information for important tasks such as semantic segmentation. However, acquiring massive RGB+X data is difficult due to the need for specific acquisition equipment. Therefore, traditional RGB+X segmentation methods often perform pretraining on relatively abundant RGB data. However, these methods lack corresponding mechanisms to fully exploit the pretrained model, and the scope of the pretraining RGB dataset remains limited. Recent works have employed prompt learning to tap into the potential of pretrained foundation models, but these methods adopt a unidirectional prompting approach i.e., using X or RGB+X modality to prompt pretrained foundation models in RGB modality, neglecting the potential in non-RGB modalities. In this paper, we are dedicated to developing the potential of pretrained foundation models in both RGB and non-RGB modalities simultaneously, which is non-trivial due to the semantic gap between modalities. Specifically, we present the CPAL (Cross-prompting Adapter with LoRAs), a framework that features a novel bi-directional adapter to simultaneously fully exploit the complementarity and bridging the semantic gap between modalities. Additionally, CPAL introduces low-rank adaption (LoRA) to fine-tune the foundation model of each modal. With the support of these elements, we have successfully unleashed the potential of RGB foundation models in both RGB and non-RGB modalities simultaneously. Our method achieves state-of-the-art (SOTA) performance on five multi-modal benchmarks, including RGB+Depth, RGB+Thermal, RGB+Event, and a multi-modal video object segmentation benchmark, as well as four multi-modal salient object detection benchmarks. The code and results are available at: https://github.com/abelny56/CPAL.

U2 - 10.1109/tcsvt.2025.3536086

DO - 10.1109/tcsvt.2025.3536086

M3 - Journal article

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

SN - 1051-8215

ER -