Rights statement: Yinghui Kong, Li Li, Ke Zhang, Qiang Ni, and Jungong Han "Attention module-based spatial–temporal graph convolutional networks for skeleton-based action recognition," Journal of Electronic Imaging 28(4), 043032 (30 August 2019). https://doi.org/10.1117/1.JEI.28.4.043032 Copyright notice format: Copyright 2019 Society of Photo-Optical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.
Accepted author manuscript, 2.47 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Attention module-based spatial-temporal graph convolutional networks for skeleton-based action recognition
AU - Kong, Y.
AU - Li, L.
AU - Zhang, K.
AU - Ni, Q.
AU - Han, J.
N1 - Yinghui Kong, Li Li, Ke Zhang, Qiang Ni, and Jungong Han "Attention module-based spatial–temporal graph convolutional networks for skeleton-based action recognition," Journal of Electronic Imaging 28(4), 043032 (30 August 2019). https://doi.org/10.1117/1.JEI.28.4.043032 Copyright notice format: Copyright 2019 Society of Photo-Optical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited. DOI abstract link format: http://dx.doi.org/DOI# (Note: The DOI can be found on the title page or online abstract page of any SPIE article.)
PY - 2019/8/30
Y1 - 2019/8/30
N2 - Skeleton-based action recognition is a significant direction of human action recognition, because the skeleton contains important information for recognizing action. The spatial-temporal graph convolutional networks (ST-GCN) automatically learn both the temporal and spatial features from the skeleton data and achieve remarkable performance for skeleton-based action recognition. However, ST-GCN just learns local information on a certain neighborhood but does not capture the correlation information between all joints (i.e., global information). Therefore, we need to introduce global information into the ST-GCN. We propose a model of dynamic skeletons called attention module-based-ST-GCN, which solves these problems by adding attention module. The attention module can capture some global information, which brings stronger expressive power and generalization capability. Experimental results on two large-scale datasets, Kinetics and NTU-RGB+D, demonstrate that our model achieves significant improvements over previous representative methods. © 2019 SPIE and IS&T.
AB - Skeleton-based action recognition is a significant direction of human action recognition, because the skeleton contains important information for recognizing action. The spatial-temporal graph convolutional networks (ST-GCN) automatically learn both the temporal and spatial features from the skeleton data and achieve remarkable performance for skeleton-based action recognition. However, ST-GCN just learns local information on a certain neighborhood but does not capture the correlation information between all joints (i.e., global information). Therefore, we need to introduce global information into the ST-GCN. We propose a model of dynamic skeletons called attention module-based-ST-GCN, which solves these problems by adding attention module. The attention module can capture some global information, which brings stronger expressive power and generalization capability. Experimental results on two large-scale datasets, Kinetics and NTU-RGB+D, demonstrate that our model achieves significant improvements over previous representative methods. © 2019 SPIE and IS&T.
KW - action recognition
KW - attention module
KW - nonlocal neural network
KW - spatial-temporal graph convolution network
KW - Convolution
KW - Large dataset
KW - Action recognition
KW - Convolutional networks
KW - Generalization capability
KW - Human-action recognition
KW - Large-scale datasets
KW - Nonlocal
KW - Spatial temporals
KW - Musculoskeletal system
U2 - 10.1117/1.JEI.28.4.043032
DO - 10.1117/1.JEI.28.4.043032
M3 - Journal article
VL - 28
JO - Journal of Electronic Imaging
JF - Journal of Electronic Imaging
SN - 1017-9909
IS - 4
M1 - 043032
ER -