Accepted author manuscript, 2.08 MB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Momentum Contrastive Teacher for Semi-Supervised Skeleton Action Recognition
AU - Lu, Mingqi
AU - Lu, Xiaobo
AU - Liu, Jun
PY - 2025/1/1
Y1 - 2025/1/1
N2 - In the field of semi-supervised skeleton action recognition, existing work primarily follows the paradigm of self-supervised training followed by supervised fine-tuning. However, self-supervised learning focuses on exploring data representation rather than label classification. Inspired by Mean Teacher, we explore a novel pseudo-label-based model called SkeleMoCLR. Specifically, we use MoCo v2 as the foundation and extend it into a teacher-student network through a momentum encoder. The generation of high-confidence pseudo-labels requires a well-pretrained model as a prerequisite. In cases where large-scale skeleton data is lacking, we propose leveraging contrastive learning to transfer discriminative action features from large vision-text models to the skeleton encoder. Following the contrastive pre-training, the key encoder branch from MoCo v2 serves as the teacher to generate pseudo-labels for training the query encoder branch. Furthermore, we introduce pseudo-labels into the memory queues, sampling negative samples from different pseudo-label classes to maximize the representation differentiation between different categories. We jointly optimize the classification loss for both labeled and pseudo-labeled data and the contrastive loss for unlabeled data to update model parameters, fully harnessing the potential of pseudo-label semi-supervised learning and self-supervised learning. Extensive experiments conducted on the NTU-60, NTU-120, PKU-MMD, and NW-UCLA datasets demonstrate that our SkeleMoCLR outperforms existing competitive methods in the semi-supervised skeleton action recognition task.
AB - In the field of semi-supervised skeleton action recognition, existing work primarily follows the paradigm of self-supervised training followed by supervised fine-tuning. However, self-supervised learning focuses on exploring data representation rather than label classification. Inspired by Mean Teacher, we explore a novel pseudo-label-based model called SkeleMoCLR. Specifically, we use MoCo v2 as the foundation and extend it into a teacher-student network through a momentum encoder. The generation of high-confidence pseudo-labels requires a well-pretrained model as a prerequisite. In cases where large-scale skeleton data is lacking, we propose leveraging contrastive learning to transfer discriminative action features from large vision-text models to the skeleton encoder. Following the contrastive pre-training, the key encoder branch from MoCo v2 serves as the teacher to generate pseudo-labels for training the query encoder branch. Furthermore, we introduce pseudo-labels into the memory queues, sampling negative samples from different pseudo-label classes to maximize the representation differentiation between different categories. We jointly optimize the classification loss for both labeled and pseudo-labeled data and the contrastive loss for unlabeled data to update model parameters, fully harnessing the potential of pseudo-label semi-supervised learning and self-supervised learning. Extensive experiments conducted on the NTU-60, NTU-120, PKU-MMD, and NW-UCLA datasets demonstrate that our SkeleMoCLR outperforms existing competitive methods in the semi-supervised skeleton action recognition task.
U2 - 10.1109/tip.2024.3522818
DO - 10.1109/tip.2024.3522818
M3 - Journal article
VL - 34
SP - 295
EP - 305
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
SN - 1057-7149
ER -