Home > Research > Publications & Outputs > Robust monocular 3D face reconstruction under c...

Electronic data

  • Elsevier_s_CAS_Revision_Final

    Rights statement: This is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neurocomputing, 520, 2022 DOI: 10.1016/j.neucom.2022.11.048

    Accepted author manuscript, 2.57 MB, PDF document

    Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Links

Text available via DOI:

View graph of relations

Robust monocular 3D face reconstruction under challenging viewing conditions

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published
Close
<mark>Journal publication date</mark>1/02/2023
<mark>Journal</mark>Neurocomputing
Volume520
Number of pages12
Pages (from-to)82-93
Publication StatusPublished
Early online date29/11/22
<mark>Original language</mark>English

Abstract

Despite extensive research, 3D face reconstruction from a single image remains an open research problem due to the high degree of variability in pose, occlusions and complex lighting conditions. While deep learning-based methods have achieved great success, they are usually limited to near frontal images and images that are free of occlusions. Also, the lack of diverse training data with 3D annotations considerably limits the performance of such methods. As such, existing methods fail to recover, with high fidelity, the facial details especially when dealing with images captured under extreme conditions. To address this issue, we propose an unsupervised coarse-to-fine framework for the reconstruction of 3D faces with detailed textures. Our core idea is that multiple images of the same person but captured under different viewing conditions should provide the same 3D face. We thus propose to leverage a self-augmentation learning technique to train a model that is robust to diverse variations. In addition, instead of directly employing image pixels, we use a set of discriminative features describing the identity and attributes of the face as input to the refinement module, making the model invariant to viewing conditions. This combination of self-augmentation learning with rich face-related features allows the reconstruction of plausible facial details even under challenging viewing conditions. We train the model end-to-end and in a self-supervised manner, without any 3D annotations, landmarks or identity labels, using a combination of an image-level photometric loss and a perception-level loss that is identity and attribute-aware. We evaluate the proposed approach on CelebA and AFLW2000 datasets, and demonstrate its robustness to appearance variations despite learning from unlabeled images. The qualitative comparisons indicate that our method produces detailed 3D faces even under extreme occlusions, out of plane rotations and noise perturbations where existing state-of-the-art methods often fail. We also quantitatively show that our method outperforms SOTA with more than 30.14%, 9.87% and 11.3% in terms of PSNR, SSIM and IDentity similarity, respectively.

Bibliographic note

This is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neurocomputing, 520, 2022 DOI: 10.1016/j.neucom.2022.11.048