Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

Computing and Communications

Associated organisational unit

Artificial Intelligence

Electronic data

Learning_Human_Pose_Models_from_Synthesized_Data_for_Robust_RGB_D_Action_Recognition__IJCV_Revision_2_
Rights statement: The final publication is available at Springer via http://dx.doi.org/10.1007/s11263-019-01192-2
Accepted author manuscript, 7.04 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.1007/s11263-019-01192-2
Final published version

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

E-pub ahead of print

Standard

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition. / Liu, Jian; Rahmani, Hossein; Akhtar, Naveed et al.
In: International Journal of Computer Vision, 06.08.2019.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Liu, J, Rahmani, H, Akhtar, N & Mian, A 2019, 'Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition', International Journal of Computer Vision. https://doi.org/10.1007/s11263-019-01192-2

APA

Liu, J., Rahmani, H., Akhtar, N., & Mian, A. (2019). Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition. International Journal of Computer Vision. Advance online publication. https://doi.org/10.1007/s11263-019-01192-2

Vancouver

Liu J, Rahmani H, Akhtar N, Mian A. Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition. International Journal of Computer Vision. 2019 Aug 6. Epub 2019 Aug 6. doi: 10.1007/s11263-019-01192-2

Author

Liu, Jian ; Rahmani, Hossein ; Akhtar, Naveed et al. / Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition. In: International Journal of Computer Vision. 2019.

Bibtex

@article{33f7fffcd0b143aea12b66ae1412142c,

title = "Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition",

abstract = "We propose Human Pose Models that represent RGB and depth images of human poses independent of clothing textures, backgrounds, lighting conditions, body shapes and camera viewpoints. Learning such universal models requires training images where all factors are varied for every human pose. Capturing such data is prohibitively expensive. Therefore, we develop a framework for synthesizing the training data. First, we learn representative human poses from a large corpus of real motion captured human skeleton data. Next, we fit synthetic 3D humans with different body shapes to each pose and render each from 180 camera viewpoints while randomly varying the clothing textures, background and lighting. Generative Adversarial Networks are employed to minimize the gap between synthetic and real image distributions. CNN models are then learned that transfer human poses to a shared high-level invariant space. The learned CNN models are then used as invariant feature extractors from real RGB and depth frames of human action videos and the temporal variations are modelled by Fourier Temporal Pyramid. Finally, linear SVM is used for classification. Experiments on three benchmark human action datasets show that our algorithm outperforms existing methods by significant margins for RGB only and RGB-D action recognition.",

author = "Jian Liu and Hossein Rahmani and Naveed Akhtar and Ajmal Mian",

note = "The final publication is available at Springer via http://dx.doi.org/10.1007/s11263-019-01192-2",

year = "2019",

month = aug,

day = "6",

doi = "10.1007/s11263-019-01192-2",

language = "English",

journal = "International Journal of Computer Vision",

issn = "0920-5691",

publisher = "Springer Netherlands",

}

RIS

TY - JOUR

T1 - Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

AU - Liu, Jian

AU - Rahmani, Hossein

AU - Akhtar, Naveed

AU - Mian, Ajmal

N1 - The final publication is available at Springer via http://dx.doi.org/10.1007/s11263-019-01192-2

PY - 2019/8/6

Y1 - 2019/8/6

N2 - We propose Human Pose Models that represent RGB and depth images of human poses independent of clothing textures, backgrounds, lighting conditions, body shapes and camera viewpoints. Learning such universal models requires training images where all factors are varied for every human pose. Capturing such data is prohibitively expensive. Therefore, we develop a framework for synthesizing the training data. First, we learn representative human poses from a large corpus of real motion captured human skeleton data. Next, we fit synthetic 3D humans with different body shapes to each pose and render each from 180 camera viewpoints while randomly varying the clothing textures, background and lighting. Generative Adversarial Networks are employed to minimize the gap between synthetic and real image distributions. CNN models are then learned that transfer human poses to a shared high-level invariant space. The learned CNN models are then used as invariant feature extractors from real RGB and depth frames of human action videos and the temporal variations are modelled by Fourier Temporal Pyramid. Finally, linear SVM is used for classification. Experiments on three benchmark human action datasets show that our algorithm outperforms existing methods by significant margins for RGB only and RGB-D action recognition.

AB - We propose Human Pose Models that represent RGB and depth images of human poses independent of clothing textures, backgrounds, lighting conditions, body shapes and camera viewpoints. Learning such universal models requires training images where all factors are varied for every human pose. Capturing such data is prohibitively expensive. Therefore, we develop a framework for synthesizing the training data. First, we learn representative human poses from a large corpus of real motion captured human skeleton data. Next, we fit synthetic 3D humans with different body shapes to each pose and render each from 180 camera viewpoints while randomly varying the clothing textures, background and lighting. Generative Adversarial Networks are employed to minimize the gap between synthetic and real image distributions. CNN models are then learned that transfer human poses to a shared high-level invariant space. The learned CNN models are then used as invariant feature extractors from real RGB and depth frames of human action videos and the temporal variations are modelled by Fourier Temporal Pyramid. Finally, linear SVM is used for classification. Experiments on three benchmark human action datasets show that our algorithm outperforms existing methods by significant margins for RGB only and RGB-D action recognition.

U2 - 10.1007/s11263-019-01192-2

DO - 10.1007/s11263-019-01192-2

M3 - Journal article

JO - International Journal of Computer Vision

JF - International Journal of Computer Vision

SN - 0920-5691

ER -

Research

Associated organisational unit

Electronic data

Links

Text available via DOI: