SMAM: Self and Mutual Adaptive Matching for Skeleton-Based Few-Shot Action Recognition

Computing and Communications

Text available via DOI:

https://doi.org/10.1109/TIP.2022.3226410
Final published version

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

SMAM: Self and Mutual Adaptive Matching for Skeleton-Based Few-Shot Action Recognition. / Li, Zhiheng; Gong, Xuyuan; Song, Ran et al.
In: IEEE Transactions on Image Processing, Vol. 32, 31.12.2023, p. 392-402.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Li, Z, Gong, X, Song, R, Duan, P, Liu, J & Zhang, W 2023, 'SMAM: Self and Mutual Adaptive Matching for Skeleton-Based Few-Shot Action Recognition', IEEE Transactions on Image Processing, vol. 32, pp. 392-402. https://doi.org/10.1109/TIP.2022.3226410

APA

Li, Z., Gong, X., Song, R., Duan, P., Liu, J., & Zhang, W. (2023). SMAM: Self and Mutual Adaptive Matching for Skeleton-Based Few-Shot Action Recognition. IEEE Transactions on Image Processing, 32, 392-402. https://doi.org/10.1109/TIP.2022.3226410

Vancouver

Li Z, Gong X, Song R, Duan P, Liu J, Zhang W. SMAM: Self and Mutual Adaptive Matching for Skeleton-Based Few-Shot Action Recognition. IEEE Transactions on Image Processing. 2023 Dec 31;32:392-402. Epub 2022 Dec 7. doi: 10.1109/TIP.2022.3226410

Author

Li, Zhiheng ; Gong, Xuyuan ; Song, Ran et al. / SMAM: Self and Mutual Adaptive Matching for Skeleton-Based Few-Shot Action Recognition. In: IEEE Transactions on Image Processing. 2023 ; Vol. 32. pp. 392-402.

Bibtex

@article{db0897aec2074a7ba8128ea0993541fd,

title = "SMAM: Self and Mutual Adaptive Matching for Skeleton-Based Few-Shot Action Recognition",

abstract = "This paper focuses on skeleton-based few-shot action recognition. Since skeleton is essentially a sparse representation of human action, the feature maps extracted from it, through a standard encoder network in the few-shot condition, may not be sufficiently discriminative for some action sequences that look partially similar to each other. To address this issue, we propose a self and mutual adaptive matching (SMAM) module to convert such feature maps into more discriminative feature vectors. Our method, named as SMAM-Net, first leverages both the temporal information associated with each individual skeleton joint and the spatial relationship among them for feature extraction. Then, the SMAM module adaptively measures the similarity between labeled and query samples and further carries out feature matching within the query set to distinguish similar skeletons of various action categories. Experimental results show that the SMAM-Net outperforms other baselines on the large-scale NTU RGB + D 120 dataset in the tasks of one-shot and five-shot action recognition. We also report our results on smaller datasets including NTU RGB + D 60, SYSU and PKU-MMD to demonstrate that our method is reliable and generalises well on different datasets. Codes and the pretrained SMAM-Net will be made publicly available.",

author = "Zhiheng Li and Xuyuan Gong and Ran Song and Peng Duan and Jun Liu and Wei Zhang",

year = "2023",

month = dec,

day = "31",

doi = "10.1109/TIP.2022.3226410",

language = "English",

volume = "32",

pages = "392--402",

journal = "IEEE Transactions on Image Processing",

issn = "1057-7149",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

RIS

TY - JOUR

T1 - SMAM: Self and Mutual Adaptive Matching for Skeleton-Based Few-Shot Action Recognition

AU - Li, Zhiheng

AU - Gong, Xuyuan

AU - Song, Ran

AU - Duan, Peng

AU - Liu, Jun

AU - Zhang, Wei

PY - 2023/12/31

Y1 - 2023/12/31

N2 - This paper focuses on skeleton-based few-shot action recognition. Since skeleton is essentially a sparse representation of human action, the feature maps extracted from it, through a standard encoder network in the few-shot condition, may not be sufficiently discriminative for some action sequences that look partially similar to each other. To address this issue, we propose a self and mutual adaptive matching (SMAM) module to convert such feature maps into more discriminative feature vectors. Our method, named as SMAM-Net, first leverages both the temporal information associated with each individual skeleton joint and the spatial relationship among them for feature extraction. Then, the SMAM module adaptively measures the similarity between labeled and query samples and further carries out feature matching within the query set to distinguish similar skeletons of various action categories. Experimental results show that the SMAM-Net outperforms other baselines on the large-scale NTU RGB + D 120 dataset in the tasks of one-shot and five-shot action recognition. We also report our results on smaller datasets including NTU RGB + D 60, SYSU and PKU-MMD to demonstrate that our method is reliable and generalises well on different datasets. Codes and the pretrained SMAM-Net will be made publicly available.

AB - This paper focuses on skeleton-based few-shot action recognition. Since skeleton is essentially a sparse representation of human action, the feature maps extracted from it, through a standard encoder network in the few-shot condition, may not be sufficiently discriminative for some action sequences that look partially similar to each other. To address this issue, we propose a self and mutual adaptive matching (SMAM) module to convert such feature maps into more discriminative feature vectors. Our method, named as SMAM-Net, first leverages both the temporal information associated with each individual skeleton joint and the spatial relationship among them for feature extraction. Then, the SMAM module adaptively measures the similarity between labeled and query samples and further carries out feature matching within the query set to distinguish similar skeletons of various action categories. Experimental results show that the SMAM-Net outperforms other baselines on the large-scale NTU RGB + D 120 dataset in the tasks of one-shot and five-shot action recognition. We also report our results on smaller datasets including NTU RGB + D 60, SYSU and PKU-MMD to demonstrate that our method is reliable and generalises well on different datasets. Codes and the pretrained SMAM-Net will be made publicly available.

U2 - 10.1109/TIP.2022.3226410

DO - 10.1109/TIP.2022.3226410

M3 - Journal article

VL - 32

SP - 392

EP - 402

JO - IEEE Transactions on Image Processing

JF - IEEE Transactions on Image Processing

SN - 1057-7149

ER -

Research

Links

Text available via DOI: