Self-Calibration Flow Guided Denoising Diffusion Model for Human Pose Transfer

Computing and Communications

Text available via DOI:

https://doi.org/10.1109/TCSVT.2024.3382948
Final published version

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Self-Calibration Flow Guided Denoising Diffusion Model for Human Pose Transfer. / Xue, Yu; Po, Lai-Man; Yu, Wing-Yin et al.
In: IEEE Transactions on Circuits and Systems for Video Technology, Vol. 34, No. 9, 30.09.2024, p. 7896-7911.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Xue, Y, Po, L-M, Yu, W-Y, Wu, H, Xu, X, Li, K & Liu, Y 2024, 'Self-Calibration Flow Guided Denoising Diffusion Model for Human Pose Transfer', IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 9, pp. 7896-7911. https://doi.org/10.1109/TCSVT.2024.3382948

APA

Xue, Y., Po, L.-M., Yu, W.-Y., Wu, H., Xu, X., Li, K., & Liu, Y. (2024). Self-Calibration Flow Guided Denoising Diffusion Model for Human Pose Transfer. IEEE Transactions on Circuits and Systems for Video Technology, 34(9), 7896-7911. https://doi.org/10.1109/TCSVT.2024.3382948

Vancouver

Xue Y, Po LM, Yu WY, Wu H, Xu X, Li K et al. Self-Calibration Flow Guided Denoising Diffusion Model for Human Pose Transfer. IEEE Transactions on Circuits and Systems for Video Technology. 2024 Sept 30;34(9):7896-7911. Epub 2024 Mar 28. doi: 10.1109/TCSVT.2024.3382948

Author

Xue, Yu ; Po, Lai-Man ; Yu, Wing-Yin et al. / Self-Calibration Flow Guided Denoising Diffusion Model for Human Pose Transfer. In: IEEE Transactions on Circuits and Systems for Video Technology. 2024 ; Vol. 34, No. 9. pp. 7896-7911.

Bibtex

@article{9a4ff1a039014ee0841b36de3b659df5,

title = "Self-Calibration Flow Guided Denoising Diffusion Model for Human Pose Transfer",

abstract = "The human pose transfer task aims to generate synthetic person images that preserve the style of reference images while accurately aligning them with the desired target pose. However, existing methods based on generative adversarial networks (GANs) struggle to produce realistic details and often face spatial misalignment issues. On the other hand, methods relying on denoising diffusion models require a large number of model parameters, resulting in slower convergence rates. To address these challenges, we propose a self-calibration flow-guided module (SCFM) to establish precise spatial correspondence between reference images and target poses. This module facilitates the denoising diffusion model in predicting the noise at each denoising step more effectively. Additionally, we introduce a multi-scale feature fusing module (MSFF) that enhances the denoising U-Net architecture through a cross-attention mechanism, achieving better performance with a reduced parameter count. Our proposed model outperforms state-of-the-art methods on the DeepFashion and Market-1501 datasets in terms of both the quantity and quality of the synthesized images. Our code is publicly available at https://github.com/zylwithxy/SCFM-guided-DDPM.",

author = "Yu Xue and Lai-Man Po and Wing-Yin Yu and Haoxuan Wu and Xuyuan Xu and Kun Li and Yuyang Liu",

year = "2024",

month = sep,

day = "30",

doi = "10.1109/TCSVT.2024.3382948",

language = "English",

volume = "34",

pages = "7896--7911",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "9",

}

RIS

TY - JOUR

T1 - Self-Calibration Flow Guided Denoising Diffusion Model for Human Pose Transfer

AU - Xue, Yu

AU - Po, Lai-Man

AU - Yu, Wing-Yin

AU - Wu, Haoxuan

AU - Xu, Xuyuan

AU - Li, Kun

AU - Liu, Yuyang

PY - 2024/9/30

Y1 - 2024/9/30

N2 - The human pose transfer task aims to generate synthetic person images that preserve the style of reference images while accurately aligning them with the desired target pose. However, existing methods based on generative adversarial networks (GANs) struggle to produce realistic details and often face spatial misalignment issues. On the other hand, methods relying on denoising diffusion models require a large number of model parameters, resulting in slower convergence rates. To address these challenges, we propose a self-calibration flow-guided module (SCFM) to establish precise spatial correspondence between reference images and target poses. This module facilitates the denoising diffusion model in predicting the noise at each denoising step more effectively. Additionally, we introduce a multi-scale feature fusing module (MSFF) that enhances the denoising U-Net architecture through a cross-attention mechanism, achieving better performance with a reduced parameter count. Our proposed model outperforms state-of-the-art methods on the DeepFashion and Market-1501 datasets in terms of both the quantity and quality of the synthesized images. Our code is publicly available at https://github.com/zylwithxy/SCFM-guided-DDPM.

AB - The human pose transfer task aims to generate synthetic person images that preserve the style of reference images while accurately aligning them with the desired target pose. However, existing methods based on generative adversarial networks (GANs) struggle to produce realistic details and often face spatial misalignment issues. On the other hand, methods relying on denoising diffusion models require a large number of model parameters, resulting in slower convergence rates. To address these challenges, we propose a self-calibration flow-guided module (SCFM) to establish precise spatial correspondence between reference images and target poses. This module facilitates the denoising diffusion model in predicting the noise at each denoising step more effectively. Additionally, we introduce a multi-scale feature fusing module (MSFF) that enhances the denoising U-Net architecture through a cross-attention mechanism, achieving better performance with a reduced parameter count. Our proposed model outperforms state-of-the-art methods on the DeepFashion and Market-1501 datasets in terms of both the quantity and quality of the synthesized images. Our code is publicly available at https://github.com/zylwithxy/SCFM-guided-DDPM.

U2 - 10.1109/TCSVT.2024.3382948

DO - 10.1109/TCSVT.2024.3382948

M3 - Journal article

VL - 34

SP - 7896

EP - 7911

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

SN - 1051-8215

IS - 9

ER -

Research

Links

Text available via DOI: