LLaFS++ - Research Portal | Lancaster University

Home > Research > Publications & Outputs > LLaFS++

Computing and Communications

Associated organisational unit

Insight

Electronic data

llafs_plus_plus
Accepted author manuscript, 6.86 MB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Text available via DOI:

https://doi.org/10.1109/tpami.2025.3573609
Final published version

View graph of relations

LLaFS++: Few-Shot Image Segmentation With Large Language Models

Research output: Contribution to Journal/Magazine › Journal article › peer-review

E-pub ahead of print

Lanyun Zhu
Tianrun Chen
Deyi Ji
Peng Xu
Jieping Ye
Jun Liu

More...

<mark>Journal publication date</mark>	30/09/2025
<mark>Journal</mark>	IEEE Transactions on Pattern Analysis and Machine Intelligence
Issue number	9
Volume	47
Number of pages	18
Pages (from-to)	7715-7732
Publication Status	E-pub ahead of print
Early online date	26/05/25
<mark>Original language</mark>	English

Abstract

Despite the rapid advancements in few-shot segmentation (FSS), most of existing methods in this domain are hampered by their reliance on the limited and biased information from only a small number of labeled samples. This limitation inherently restricts their capability to achieve sufficiently high levels of performance. To address this issue, this paper proposes a pioneering framework named LLaFS++, which, for the first time, applies large language models (LLMs) into FSS and achieves notable success. LLaFS++ leverages the extensive prior knowledge embedded by LLMs to guide the segmentation process, effectively compensating for the limited information contained in the few-shot labeled samples and thereby achieving superior results. To enhance the effectiveness of the text-based LLMs in FSS scenarios, we present several innovative and task-specific designs within the LLaFS++ framework. Specifically, we introduce an input instruction that allows the LLM to directly produce segmentation results represented as polygons, and propose a region-attribute corresponding table to simulate the human visual system and provide multi-modal guidance. We also synthesize pseudo samples and use curriculum learning for pretraining to augment data and achieve better optimization, and propose a novel inference method to mitigate potential oversegmentation hallucinations caused by the regional guidance information. Incorporating these designs, LLaFS++ constitutes an effective framework that achieves state-of-the-art results on multiple datasets including PASCAL-5 ⁱ, COCO-20 ⁱ, and FSS-1000. Our superior performance showcases the remarkable potential of applying LLMs to process few-shot vision tasks.

Research

Associated organisational unit

Electronic data

Links

Text available via DOI:

LLaFS++: Few-Shot Image Segmentation With Large Language Models

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us