Momentum Contrastive Teacher for Semi-Supervised Skeleton Action Recognition

Computing and Communications

Associated organisational unit

Insight

Electronic data

paper
Accepted author manuscript, 2.08 MB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License

Text available via DOI:

https://doi.org/10.1109/tip.2024.3522818
Final published version

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

E-pub ahead of print

Standard

Momentum Contrastive Teacher for Semi-Supervised Skeleton Action Recognition. / Lu, Mingqi; Lu, Xiaobo; Liu, Jun.
In: IEEE Transactions on Image Processing, Vol. 34, 31.12.2025, p. 295-305.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Lu, M, Lu, X & Liu, J 2025, 'Momentum Contrastive Teacher for Semi-Supervised Skeleton Action Recognition', IEEE Transactions on Image Processing, vol. 34, pp. 295-305. https://doi.org/10.1109/tip.2024.3522818

APA

Lu, M., Lu, X., & Liu, J. (2025). Momentum Contrastive Teacher for Semi-Supervised Skeleton Action Recognition. IEEE Transactions on Image Processing, 34, 295-305. Advance online publication. https://doi.org/10.1109/tip.2024.3522818

Vancouver

Lu M, Lu X, Liu J. Momentum Contrastive Teacher for Semi-Supervised Skeleton Action Recognition. IEEE Transactions on Image Processing. 2025 Dec 31;34:295-305. Epub 2025 Jan 1. doi: 10.1109/tip.2024.3522818

Author

Lu, Mingqi ; Lu, Xiaobo ; Liu, Jun. / Momentum Contrastive Teacher for Semi-Supervised Skeleton Action Recognition. In: IEEE Transactions on Image Processing. 2025 ; Vol. 34. pp. 295-305.

Bibtex

@article{b7c19c2f77b64eb5a148f29ba06baacd,

title = "Momentum Contrastive Teacher for Semi-Supervised Skeleton Action Recognition",

abstract = "In the field of semi-supervised skeleton action recognition, existing work primarily follows the paradigm of self-supervised training followed by supervised fine-tuning. However, self-supervised learning focuses on exploring data representation rather than label classification. Inspired by Mean Teacher, we explore a novel pseudo-label-based model called SkeleMoCLR. Specifically, we use MoCo v2 as the foundation and extend it into a teacher-student network through a momentum encoder. The generation of high-confidence pseudo-labels requires a well-pretrained model as a prerequisite. In cases where large-scale skeleton data is lacking, we propose leveraging contrastive learning to transfer discriminative action features from large vision-text models to the skeleton encoder. Following the contrastive pre-training, the key encoder branch from MoCo v2 serves as the teacher to generate pseudo-labels for training the query encoder branch. Furthermore, we introduce pseudo-labels into the memory queues, sampling negative samples from different pseudo-label classes to maximize the representation differentiation between different categories. We jointly optimize the classification loss for both labeled and pseudo-labeled data and the contrastive loss for unlabeled data to update model parameters, fully harnessing the potential of pseudo-label semi-supervised learning and self-supervised learning. Extensive experiments conducted on the NTU-60, NTU-120, PKU-MMD, and NW-UCLA datasets demonstrate that our SkeleMoCLR outperforms existing competitive methods in the semi-supervised skeleton action recognition task.",

author = "Mingqi Lu and Xiaobo Lu and Jun Liu",

year = "2025",

month = jan,

day = "1",

doi = "10.1109/tip.2024.3522818",

language = "English",

volume = "34",

pages = "295--305",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

RIS

TY - JOUR

T1 - Momentum Contrastive Teacher for Semi-Supervised Skeleton Action Recognition

AU - Lu, Mingqi

AU - Lu, Xiaobo

AU - Liu, Jun

PY - 2025/1/1

Y1 - 2025/1/1

N2 - In the field of semi-supervised skeleton action recognition, existing work primarily follows the paradigm of self-supervised training followed by supervised fine-tuning. However, self-supervised learning focuses on exploring data representation rather than label classification. Inspired by Mean Teacher, we explore a novel pseudo-label-based model called SkeleMoCLR. Specifically, we use MoCo v2 as the foundation and extend it into a teacher-student network through a momentum encoder. The generation of high-confidence pseudo-labels requires a well-pretrained model as a prerequisite. In cases where large-scale skeleton data is lacking, we propose leveraging contrastive learning to transfer discriminative action features from large vision-text models to the skeleton encoder. Following the contrastive pre-training, the key encoder branch from MoCo v2 serves as the teacher to generate pseudo-labels for training the query encoder branch. Furthermore, we introduce pseudo-labels into the memory queues, sampling negative samples from different pseudo-label classes to maximize the representation differentiation between different categories. We jointly optimize the classification loss for both labeled and pseudo-labeled data and the contrastive loss for unlabeled data to update model parameters, fully harnessing the potential of pseudo-label semi-supervised learning and self-supervised learning. Extensive experiments conducted on the NTU-60, NTU-120, PKU-MMD, and NW-UCLA datasets demonstrate that our SkeleMoCLR outperforms existing competitive methods in the semi-supervised skeleton action recognition task.

AB - In the field of semi-supervised skeleton action recognition, existing work primarily follows the paradigm of self-supervised training followed by supervised fine-tuning. However, self-supervised learning focuses on exploring data representation rather than label classification. Inspired by Mean Teacher, we explore a novel pseudo-label-based model called SkeleMoCLR. Specifically, we use MoCo v2 as the foundation and extend it into a teacher-student network through a momentum encoder. The generation of high-confidence pseudo-labels requires a well-pretrained model as a prerequisite. In cases where large-scale skeleton data is lacking, we propose leveraging contrastive learning to transfer discriminative action features from large vision-text models to the skeleton encoder. Following the contrastive pre-training, the key encoder branch from MoCo v2 serves as the teacher to generate pseudo-labels for training the query encoder branch. Furthermore, we introduce pseudo-labels into the memory queues, sampling negative samples from different pseudo-label classes to maximize the representation differentiation between different categories. We jointly optimize the classification loss for both labeled and pseudo-labeled data and the contrastive loss for unlabeled data to update model parameters, fully harnessing the potential of pseudo-label semi-supervised learning and self-supervised learning. Extensive experiments conducted on the NTU-60, NTU-120, PKU-MMD, and NW-UCLA datasets demonstrate that our SkeleMoCLR outperforms existing competitive methods in the semi-supervised skeleton action recognition task.

U2 - 10.1109/tip.2024.3522818

DO - 10.1109/tip.2024.3522818

M3 - Journal article

VL - 34

SP - 295

EP - 305

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

SN - 1057-7149

ER -

Research

Associated organisational unit

Electronic data

Links

Text available via DOI: