Accepted author manuscript, 2.65 MB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Skeleton-prompt
T2 - A cross-dataset transfer learning approach for skeleton action recognition
AU - Lu, M.
AU - Lu, X.
AU - Liu, J.
PY - 2025/6/9
Y1 - 2025/6/9
N2 - This paper presents Skeleton-Prompt, a novel tuning method designed to tackle cross-dataset transfer issues in skeleton action recognition models. Given the scarcity of large-scale 3D skeleton datasets and the variability in keypoint structures across datasets, existing methods often rely on training models from scratch, necessitating extensive labeled data and exhibiting high sensitivity to occlusion. Our approach aims to fine-tune pre-trained models to adapt to limited real-world skeleton data. We use 2D skeletons as inputs and leverage a large human motion dataset for 2D to 3D pose estimation to learn generalizable motion features. A lightweight prompt generator produces instance-level prompts, and we employ dynamic queries with cross-attention to refine the semantic information of the input data. Additionally, we introduce a joint-enhanced multi-stream fusion mechanism based on self-attention to improve robustness against incomplete skeletons. Skeleton-Prompt represents a significant advancement in efficient fine-tuning for skeleton action recognition, effectively addressing cross-dataset generalization challenges in a data-efficient and parameter-efficient manner.
AB - This paper presents Skeleton-Prompt, a novel tuning method designed to tackle cross-dataset transfer issues in skeleton action recognition models. Given the scarcity of large-scale 3D skeleton datasets and the variability in keypoint structures across datasets, existing methods often rely on training models from scratch, necessitating extensive labeled data and exhibiting high sensitivity to occlusion. Our approach aims to fine-tune pre-trained models to adapt to limited real-world skeleton data. We use 2D skeletons as inputs and leverage a large human motion dataset for 2D to 3D pose estimation to learn generalizable motion features. A lightweight prompt generator produces instance-level prompts, and we employ dynamic queries with cross-attention to refine the semantic information of the input data. Additionally, we introduce a joint-enhanced multi-stream fusion mechanism based on self-attention to improve robustness against incomplete skeletons. Skeleton-Prompt represents a significant advancement in efficient fine-tuning for skeleton action recognition, effectively addressing cross-dataset generalization challenges in a data-efficient and parameter-efficient manner.
U2 - 10.1016/j.patcog.2025.111885
DO - 10.1016/j.patcog.2025.111885
M3 - Journal article
VL - 169
JO - Pattern Recognition
JF - Pattern Recognition
SN - 0031-3203
M1 - 111885
ER -