A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder

Computing and Communications

Text available via DOI:

https://doi.org/10.1109/ICCV48922.2021.01144
Final published version

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Yujun Cai
Yiwei Wang
Yiheng Zhu
Tat Jen Cham
Jianfei Cai
Junsong Yuan
Jun Liu
Chuanxia Zheng
Sijie Yan
Henghui Ding
Xiaohui Shen
Ding Liu
Nadia Magnenat Thalmann

More...

Publication date	28/02/2022
Host publication	Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	11625-11635
Number of pages	11
ISBN (electronic)	9781665428125
<mark>Original language</mark>	English
Event	18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 - Virtual, Online, Canada Duration: 11/10/2021 → 17/10/2021

Conference

Conference	18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
Country/Territory	Canada
City	Virtual, Online
Period	11/10/21 → 17/10/21

Publication series

Name	Proceedings of the IEEE International Conference on Computer Vision
ISSN (Print)	1550-5499

Conference

Conference	18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
Country/Territory	Canada
City	Virtual, Online
Period	11/10/21 → 17/10/21

Abstract

We present a unified and flexible framework to address the generalized problem of 3D motion synthesis that covers the tasks of motion prediction, completion, interpolation, and spatial-temporal recovery. Since these tasks have different input constraints and various fidelity and diversity requirements, most existing approaches only cater to a specific task or use different architectures to address various tasks. Here we propose a unified framework based on Conditional Variational Auto-Encoder (CVAE), where we treat any arbitrary input as a masked motion series. Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series. To further allow the flexibility of manipulating the motion style of the generated series, we design an Action-Adaptive Modulation (AAM) to propagate the given semantic guidance through the whole sequence. We also introduce a cross-attention mechanism to exploit distant relations among decoder and encoder features for better realism and global consistency. We conducted extensive experiments on Human 3.6M and CMU-Mocap. The results show that our method produces coherent and realistic results for various motion synthesis tasks, with the synthesized motions distinctly adapted by the given action labels.

Research

Links

Text available via DOI: