Robust monocular 3D face reconstruction under challenging viewing conditions

Computing and Communications

Associated organisational units

Electronic data

Elsevier_s_CAS_Revision_Final
Rights statement: This is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neurocomputing, 520, 2022 DOI: 10.1016/j.neucom.2022.11.048
Accepted author manuscript, 2.57 MB, PDF document
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Text available via DOI:

https://doi.org/10.1016/j.neucom.2022.11.048
Final published version

Keywords

Robust 3D face reconstruction, Face texture refinement, Self-augmentation, Discriminative face features, Attribute- aware loss, Graph Convolutional Networks (GCN)

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Robust monocular 3D face reconstruction under challenging viewing conditions. / Mohaghegh, Hoda; Rahmani, Hossein; Bennamoun, Mohammed.
In: Neurocomputing, Vol. 520, 01.02.2023, p. 82-93.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Mohaghegh, H, Rahmani, H & Bennamoun, M 2023, 'Robust monocular 3D face reconstruction under challenging viewing conditions', Neurocomputing, vol. 520, pp. 82-93. https://doi.org/10.1016/j.neucom.2022.11.048

APA

Mohaghegh, H., Rahmani, H., & Bennamoun, M. (2023). Robust monocular 3D face reconstruction under challenging viewing conditions. Neurocomputing, 520, 82-93. https://doi.org/10.1016/j.neucom.2022.11.048

Vancouver

Mohaghegh H, Rahmani H, Bennamoun M. Robust monocular 3D face reconstruction under challenging viewing conditions. Neurocomputing. 2023 Feb 1;520:82-93. Epub 2022 Nov 29. doi: 10.1016/j.neucom.2022.11.048

Author

Mohaghegh, Hoda ; Rahmani, Hossein ; Bennamoun, Mohammed. / Robust monocular 3D face reconstruction under challenging viewing conditions. In: Neurocomputing. 2023 ; Vol. 520. pp. 82-93.

Bibtex

@article{8b7ba34b0c5440ecb289618705e6bcb8,

title = "Robust monocular 3D face reconstruction under challenging viewing conditions",

abstract = "Despite extensive research, 3D face reconstruction from a single image remains an open research problem due to the high degree of variability in pose, occlusions and complex lighting conditions. While deep learning-based methods have achieved great success, they are usually limited to near frontal images and images that are free of occlusions. Also, the lack of diverse training data with 3D annotations considerably limits the performance of such methods. As such, existing methods fail to recover, with high fidelity, the facial details especially when dealing with images captured under extreme conditions. To address this issue, we propose an unsupervised coarse-to-fine framework for the reconstruction of 3D faces with detailed textures. Our core idea is that multiple images of the same person but captured under different viewing conditions should provide the same 3D face. We thus propose to leverage a self-augmentation learning technique to train a model that is robust to diverse variations. In addition, instead of directly employing image pixels, we use a set of discriminative features describing the identity and attributes of the face as input to the refinement module, making the model invariant to viewing conditions. This combination of self-augmentation learning with rich face-related features allows the reconstruction of plausible facial details even under challenging viewing conditions. We train the model end-to-end and in a self-supervised manner, without any 3D annotations, landmarks or identity labels, using a combination of an image-level photometric loss and a perception-level loss that is identity and attribute-aware. We evaluate the proposed approach on CelebA and AFLW2000 datasets, and demonstrate its robustness to appearance variations despite learning from unlabeled images. The qualitative comparisons indicate that our method produces detailed 3D faces even under extreme occlusions, out of plane rotations and noise perturbations where existing state-of-the-art methods often fail. We also quantitatively show that our method outperforms SOTA with more than 30.14%, 9.87% and 11.3% in terms of PSNR, SSIM and IDentity similarity, respectively.",

keywords = "Robust 3D face reconstruction, Face texture refinement, Self-augmentation, Discriminative face features, Attribute- aware loss, Graph Convolutional Networks (GCN)",

author = "Hoda Mohaghegh and Hossein Rahmani and Mohammed Bennamoun",

note = "This is the author{\textquoteright}s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neurocomputing, 520, 2022 DOI: 10.1016/j.neucom.2022.11.048",

year = "2023",

month = feb,

day = "1",

doi = "10.1016/j.neucom.2022.11.048",

language = "English",

volume = "520",

pages = "82--93",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier Science B.V.",

}

RIS

TY - JOUR

T1 - Robust monocular 3D face reconstruction under challenging viewing conditions

AU - Mohaghegh, Hoda

AU - Rahmani, Hossein

AU - Bennamoun, Mohammed

N1 - This is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neurocomputing, 520, 2022 DOI: 10.1016/j.neucom.2022.11.048

PY - 2023/2/1

Y1 - 2023/2/1

N2 - Despite extensive research, 3D face reconstruction from a single image remains an open research problem due to the high degree of variability in pose, occlusions and complex lighting conditions. While deep learning-based methods have achieved great success, they are usually limited to near frontal images and images that are free of occlusions. Also, the lack of diverse training data with 3D annotations considerably limits the performance of such methods. As such, existing methods fail to recover, with high fidelity, the facial details especially when dealing with images captured under extreme conditions. To address this issue, we propose an unsupervised coarse-to-fine framework for the reconstruction of 3D faces with detailed textures. Our core idea is that multiple images of the same person but captured under different viewing conditions should provide the same 3D face. We thus propose to leverage a self-augmentation learning technique to train a model that is robust to diverse variations. In addition, instead of directly employing image pixels, we use a set of discriminative features describing the identity and attributes of the face as input to the refinement module, making the model invariant to viewing conditions. This combination of self-augmentation learning with rich face-related features allows the reconstruction of plausible facial details even under challenging viewing conditions. We train the model end-to-end and in a self-supervised manner, without any 3D annotations, landmarks or identity labels, using a combination of an image-level photometric loss and a perception-level loss that is identity and attribute-aware. We evaluate the proposed approach on CelebA and AFLW2000 datasets, and demonstrate its robustness to appearance variations despite learning from unlabeled images. The qualitative comparisons indicate that our method produces detailed 3D faces even under extreme occlusions, out of plane rotations and noise perturbations where existing state-of-the-art methods often fail. We also quantitatively show that our method outperforms SOTA with more than 30.14%, 9.87% and 11.3% in terms of PSNR, SSIM and IDentity similarity, respectively.

AB - Despite extensive research, 3D face reconstruction from a single image remains an open research problem due to the high degree of variability in pose, occlusions and complex lighting conditions. While deep learning-based methods have achieved great success, they are usually limited to near frontal images and images that are free of occlusions. Also, the lack of diverse training data with 3D annotations considerably limits the performance of such methods. As such, existing methods fail to recover, with high fidelity, the facial details especially when dealing with images captured under extreme conditions. To address this issue, we propose an unsupervised coarse-to-fine framework for the reconstruction of 3D faces with detailed textures. Our core idea is that multiple images of the same person but captured under different viewing conditions should provide the same 3D face. We thus propose to leverage a self-augmentation learning technique to train a model that is robust to diverse variations. In addition, instead of directly employing image pixels, we use a set of discriminative features describing the identity and attributes of the face as input to the refinement module, making the model invariant to viewing conditions. This combination of self-augmentation learning with rich face-related features allows the reconstruction of plausible facial details even under challenging viewing conditions. We train the model end-to-end and in a self-supervised manner, without any 3D annotations, landmarks or identity labels, using a combination of an image-level photometric loss and a perception-level loss that is identity and attribute-aware. We evaluate the proposed approach on CelebA and AFLW2000 datasets, and demonstrate its robustness to appearance variations despite learning from unlabeled images. The qualitative comparisons indicate that our method produces detailed 3D faces even under extreme occlusions, out of plane rotations and noise perturbations where existing state-of-the-art methods often fail. We also quantitatively show that our method outperforms SOTA with more than 30.14%, 9.87% and 11.3% in terms of PSNR, SSIM and IDentity similarity, respectively.

KW - Robust 3D face reconstruction

KW - Face texture refinement

KW - Self-augmentation

KW - Discriminative face features

KW - Attribute- aware loss

KW - Graph Convolutional Networks (GCN)

U2 - 10.1016/j.neucom.2022.11.048

DO - 10.1016/j.neucom.2022.11.048

M3 - Journal article

VL - 520

SP - 82

EP - 93

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

ER -

Research

Associated organisational units

Electronic data

Links

Text available via DOI:

Keywords