A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder

Computing and Communications

Text available via DOI:

https://doi.org/10.1109/ICCV48922.2021.01144
Final published version

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder. / Cai, Yujun; Wang, Yiwei; Zhu, Yiheng et al.
Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021. Institute of Electrical and Electronics Engineers Inc., 2022. p. 11625-11635 (Proceedings of the IEEE International Conference on Computer Vision).

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Cai, Y, Wang, Y, Zhu, Y, Cham, TJ, Cai, J, Yuan, J, Liu, J, Zheng, C, Yan, S, Ding, H, Shen, X, Liu, D & Thalmann, NM 2022, A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder. in Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021. Proceedings of the IEEE International Conference on Computer Vision, Institute of Electrical and Electronics Engineers Inc., pp. 11625-11635, 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021, Virtual, Online, Canada, 11/10/21. https://doi.org/10.1109/ICCV48922.2021.01144

APA

Cai, Y., Wang, Y., Zhu, Y., Cham, T. J., Cai, J., Yuan, J., Liu, J., Zheng, C., Yan, S., Ding, H., Shen, X., Liu, D., & Thalmann, N. M. (2022). A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder. In Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021 (pp. 11625-11635). (Proceedings of the IEEE International Conference on Computer Vision). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV48922.2021.01144

Vancouver

Cai Y, Wang Y, Zhu Y, Cham TJ, Cai J, Yuan J et al. A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder. In Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021. Institute of Electrical and Electronics Engineers Inc. 2022. p. 11625-11635. (Proceedings of the IEEE International Conference on Computer Vision). Epub 2021 Oct 10. doi: 10.1109/ICCV48922.2021.01144

Author

Cai, Yujun ; Wang, Yiwei ; Zhu, Yiheng et al. / A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder. Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021. Institute of Electrical and Electronics Engineers Inc., 2022. pp. 11625-11635 (Proceedings of the IEEE International Conference on Computer Vision).

Bibtex

@inproceedings{f43042b0ed7940b8af95b2aef02543ea,

title = "A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder",

abstract = "We present a unified and flexible framework to address the generalized problem of 3D motion synthesis that covers the tasks of motion prediction, completion, interpolation, and spatial-temporal recovery. Since these tasks have different input constraints and various fidelity and diversity requirements, most existing approaches only cater to a specific task or use different architectures to address various tasks. Here we propose a unified framework based on Conditional Variational Auto-Encoder (CVAE), where we treat any arbitrary input as a masked motion series. Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series. To further allow the flexibility of manipulating the motion style of the generated series, we design an Action-Adaptive Modulation (AAM) to propagate the given semantic guidance through the whole sequence. We also introduce a cross-attention mechanism to exploit distant relations among decoder and encoder features for better realism and global consistency. We conducted extensive experiments on Human 3.6M and CMU-Mocap. The results show that our method produces coherent and realistic results for various motion synthesis tasks, with the synthesized motions distinctly adapted by the given action labels.",

author = "Yujun Cai and Yiwei Wang and Yiheng Zhu and Cham, {Tat Jen} and Jianfei Cai and Junsong Yuan and Jun Liu and Chuanxia Zheng and Sijie Yan and Henghui Ding and Xiaohui Shen and Ding Liu and Thalmann, {Nadia Magnenat}",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE; 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 ; Conference date: 11-10-2021 Through 17-10-2021",

year = "2022",

month = feb,

day = "28",

doi = "10.1109/ICCV48922.2021.01144",

language = "English",

series = "Proceedings of the IEEE International Conference on Computer Vision",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "11625--11635",

booktitle = "Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021",

}

RIS

TY - GEN

T1 - A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder

AU - Cai, Yujun

AU - Wang, Yiwei

AU - Zhu, Yiheng

AU - Cham, Tat Jen

AU - Cai, Jianfei

AU - Yuan, Junsong

AU - Liu, Jun

AU - Zheng, Chuanxia

AU - Yan, Sijie

AU - Ding, Henghui

AU - Shen, Xiaohui

AU - Liu, Ding

AU - Thalmann, Nadia Magnenat

PY - 2022/2/28

Y1 - 2022/2/28

N2 - We present a unified and flexible framework to address the generalized problem of 3D motion synthesis that covers the tasks of motion prediction, completion, interpolation, and spatial-temporal recovery. Since these tasks have different input constraints and various fidelity and diversity requirements, most existing approaches only cater to a specific task or use different architectures to address various tasks. Here we propose a unified framework based on Conditional Variational Auto-Encoder (CVAE), where we treat any arbitrary input as a masked motion series. Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series. To further allow the flexibility of manipulating the motion style of the generated series, we design an Action-Adaptive Modulation (AAM) to propagate the given semantic guidance through the whole sequence. We also introduce a cross-attention mechanism to exploit distant relations among decoder and encoder features for better realism and global consistency. We conducted extensive experiments on Human 3.6M and CMU-Mocap. The results show that our method produces coherent and realistic results for various motion synthesis tasks, with the synthesized motions distinctly adapted by the given action labels.

AB - We present a unified and flexible framework to address the generalized problem of 3D motion synthesis that covers the tasks of motion prediction, completion, interpolation, and spatial-temporal recovery. Since these tasks have different input constraints and various fidelity and diversity requirements, most existing approaches only cater to a specific task or use different architectures to address various tasks. Here we propose a unified framework based on Conditional Variational Auto-Encoder (CVAE), where we treat any arbitrary input as a masked motion series. Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series. To further allow the flexibility of manipulating the motion style of the generated series, we design an Action-Adaptive Modulation (AAM) to propagate the given semantic guidance through the whole sequence. We also introduce a cross-attention mechanism to exploit distant relations among decoder and encoder features for better realism and global consistency. We conducted extensive experiments on Human 3.6M and CMU-Mocap. The results show that our method produces coherent and realistic results for various motion synthesis tasks, with the synthesized motions distinctly adapted by the given action labels.

U2 - 10.1109/ICCV48922.2021.01144

DO - 10.1109/ICCV48922.2021.01144

M3 - Conference contribution/Paper

AN - SCOPUS:85113641917

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 11625

EP - 11635

BT - Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021

Y2 - 11 October 2021 through 17 October 2021

ER -

Research

Links

Text available via DOI: