A spatial attentive and temporal dilated (SATD) GCN for skeleton-based action recognition

Computing and Communications

Text available via DOI:

https://doi.org/10.1049/cit2.12012
Final published version
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

A spatial attentive and temporal dilated (SATD) GCN for skeleton-based action recognition. / Zhang, J.; Ye, G.; Tu, Z. et al.
In: CAAI Transactions on Intelligence Technology, Vol. 7, No. 1, 31.03.2022, p. 46-55.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Zhang, J, Ye, G, Tu, Z, Qin, Y, Qin, Q, Zhang, J & Liu, J 2022, 'A spatial attentive and temporal dilated (SATD) GCN for skeleton-based action recognition', CAAI Transactions on Intelligence Technology, vol. 7, no. 1, pp. 46-55. https://doi.org/10.1049/cit2.12012

APA

Zhang, J., Ye, G., Tu, Z., Qin, Y., Qin, Q., Zhang, J., & Liu, J. (2022). A spatial attentive and temporal dilated (SATD) GCN for skeleton-based action recognition. CAAI Transactions on Intelligence Technology, 7(1), 46-55. https://doi.org/10.1049/cit2.12012

Vancouver

Zhang J, Ye G, Tu Z, Qin Y, Qin Q, Zhang J et al. A spatial attentive and temporal dilated (SATD) GCN for skeleton-based action recognition. CAAI Transactions on Intelligence Technology. 2022 Mar 31;7(1):46-55. Epub 2021 Mar 17. doi: 10.1049/cit2.12012

Author

Zhang, J. ; Ye, G. ; Tu, Z. et al. / A spatial attentive and temporal dilated (SATD) GCN for skeleton-based action recognition. In: CAAI Transactions on Intelligence Technology. 2022 ; Vol. 7, No. 1. pp. 46-55.

Bibtex

@article{8f9c8d3d80814a5dbb4d659b924e4738,

title = "A spatial attentive and temporal dilated (SATD) GCN for skeleton-based action recognition",

abstract = "Current studies have shown that the spatial-temporal graph convolutional network (ST-GCN) is effective for skeleton-based action recognition. However, for the existing ST-GCN-based methods, their temporal kernel size is usually fixed over all layers, which makes them cannot fully exploit the temporal dependency between discontinuous frames and different sequence lengths. Besides, most of these methods use average pooling to obtain global graph feature from vertex features, resulting in losing much fine-grained information for action classification. To address these issues, in this work, the authors propose a novel spatial attentive and temporal dilated graph convolutional network (SATD-GCN). It contains two important components, that is, a spatial attention pooling module (SAP) and a temporal dilated graph convolution module (TDGC). Specifically, the SAP module can select the human body joints which are beneficial for action recognition by a self-attention mechanism and alleviates the influence of data redundancy and noise. The TDGC module can effectively extract the temporal features at different time scales, which is useful to improve the temporal perception field and enhance the robustness of the model to different motion speed and sequence length. Importantly, both the SAP module and the TDGC module can be easily integrated into the ST-GCN-based models, and significantly improve their performance. Extensive experiments on two large-scale benchmark datasets, that is, NTU-RGB + D and Kinetics-Skeleton, demonstrate that the authors{\textquoteright} method achieves the state-of-the-art performance for skeleton-based action recognition.",

author = "J. Zhang and G. Ye and Z. Tu and Y. Qin and Q. Qin and J. Zhang and Jun Liu",

year = "2022",

month = mar,

day = "31",

doi = "10.1049/cit2.12012",

language = "English",

volume = "7",

pages = "46--55",

journal = "CAAI Transactions on Intelligence Technology",

issn = "2468-6557",

publisher = "John Wiley & Sons Inc.",

number = "1",

}

RIS

TY - JOUR

T1 - A spatial attentive and temporal dilated (SATD) GCN for skeleton-based action recognition

AU - Zhang, J.

AU - Ye, G.

AU - Tu, Z.

AU - Qin, Y.

AU - Qin, Q.

AU - Zhang, J.

AU - Liu, Jun

PY - 2022/3/31

Y1 - 2022/3/31

N2 - Current studies have shown that the spatial-temporal graph convolutional network (ST-GCN) is effective for skeleton-based action recognition. However, for the existing ST-GCN-based methods, their temporal kernel size is usually fixed over all layers, which makes them cannot fully exploit the temporal dependency between discontinuous frames and different sequence lengths. Besides, most of these methods use average pooling to obtain global graph feature from vertex features, resulting in losing much fine-grained information for action classification. To address these issues, in this work, the authors propose a novel spatial attentive and temporal dilated graph convolutional network (SATD-GCN). It contains two important components, that is, a spatial attention pooling module (SAP) and a temporal dilated graph convolution module (TDGC). Specifically, the SAP module can select the human body joints which are beneficial for action recognition by a self-attention mechanism and alleviates the influence of data redundancy and noise. The TDGC module can effectively extract the temporal features at different time scales, which is useful to improve the temporal perception field and enhance the robustness of the model to different motion speed and sequence length. Importantly, both the SAP module and the TDGC module can be easily integrated into the ST-GCN-based models, and significantly improve their performance. Extensive experiments on two large-scale benchmark datasets, that is, NTU-RGB + D and Kinetics-Skeleton, demonstrate that the authors’ method achieves the state-of-the-art performance for skeleton-based action recognition.

AB - Current studies have shown that the spatial-temporal graph convolutional network (ST-GCN) is effective for skeleton-based action recognition. However, for the existing ST-GCN-based methods, their temporal kernel size is usually fixed over all layers, which makes them cannot fully exploit the temporal dependency between discontinuous frames and different sequence lengths. Besides, most of these methods use average pooling to obtain global graph feature from vertex features, resulting in losing much fine-grained information for action classification. To address these issues, in this work, the authors propose a novel spatial attentive and temporal dilated graph convolutional network (SATD-GCN). It contains two important components, that is, a spatial attention pooling module (SAP) and a temporal dilated graph convolution module (TDGC). Specifically, the SAP module can select the human body joints which are beneficial for action recognition by a self-attention mechanism and alleviates the influence of data redundancy and noise. The TDGC module can effectively extract the temporal features at different time scales, which is useful to improve the temporal perception field and enhance the robustness of the model to different motion speed and sequence length. Importantly, both the SAP module and the TDGC module can be easily integrated into the ST-GCN-based models, and significantly improve their performance. Extensive experiments on two large-scale benchmark datasets, that is, NTU-RGB + D and Kinetics-Skeleton, demonstrate that the authors’ method achieves the state-of-the-art performance for skeleton-based action recognition.

U2 - 10.1049/cit2.12012

DO - 10.1049/cit2.12012

M3 - Journal article

VL - 7

SP - 46

EP - 55

JO - CAAI Transactions on Intelligence Technology

JF - CAAI Transactions on Intelligence Technology

SN - 2468-6557

IS - 1

ER -

Research

Links

Text available via DOI: