Accepted author manuscript, 5.48 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License
Final published version
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - REMOTE
T2 - 36th AAAI Conference on Artificial Intelligence
AU - Ma, Xianzheng
AU - Rahmani, Hossein
AU - Fan, Zhipeng
AU - Yang, Bin
AU - Cheng, Jun
AU - Liu, Jun
PY - 2022/6/28
Y1 - 2022/6/28
N2 - Existing approaches for 2D pose estimation in videos often require a large number of dense annotations, which are costly and labor intensive to acquire. In this paper, we propose a semi-supervised REinforced MOtion Transformation nEtwork (REMOTE) to leverage a few labeled frames and temporal pose variations in videos, which enables effective learning of 2D pose estimation in sparsely annotated videos. Specifically, we introduce a Motion Transformer (MT) module to perform cross frame reconstruction, aiming to learn motion dynamic knowledge in videos. Besides, a novel reinforcement learning-based Frame Selection Agent (FSA) is designed within our framework, which is able to harness informative frame pairs on the fly to enhance the pose estimator under our cross reconstruction mechanism. We conduct extensive experiments that show the efficacy of our proposed REMOTE framework.
AB - Existing approaches for 2D pose estimation in videos often require a large number of dense annotations, which are costly and labor intensive to acquire. In this paper, we propose a semi-supervised REinforced MOtion Transformation nEtwork (REMOTE) to leverage a few labeled frames and temporal pose variations in videos, which enables effective learning of 2D pose estimation in sparsely annotated videos. Specifically, we introduce a Motion Transformer (MT) module to perform cross frame reconstruction, aiming to learn motion dynamic knowledge in videos. Besides, a novel reinforcement learning-based Frame Selection Agent (FSA) is designed within our framework, which is able to harness informative frame pairs on the fly to enhance the pose estimator under our cross reconstruction mechanism. We conduct extensive experiments that show the efficacy of our proposed REMOTE framework.
UR - https://www.aaai.org/AAAI22Papers/AAAI-5513.XianzhengM.pdf
M3 - Conference contribution/Paper
SP - 1944
EP - 1952
BT - Proceedings of the 36th AAAI Conference on Artificial Intelligence
PB - AAAI press
CY - Palo Alto, Calif.
Y2 - 22 February 2022 through 1 March 2022
ER -