Home > Research > Publications & Outputs > DR-FER

Links

Text available via DOI:

View graph of relations

DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition. / Li, Ming; Fu, Huazhu; He, Shengfeng et al.
In: IEEE Transactions on Multimedia, Vol. 26, 31.12.2024, p. 6297-6309.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Li, M, Fu, H, He, S, Fan, H, Liu, J, Keppo, J & Shou, MZ 2024, 'DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition', IEEE Transactions on Multimedia, vol. 26, pp. 6297-6309. https://doi.org/10.1109/TMM.2023.3347849

APA

Li, M., Fu, H., He, S., Fan, H., Liu, J., Keppo, J., & Shou, M. Z. (2024). DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition. IEEE Transactions on Multimedia, 26, 6297-6309. https://doi.org/10.1109/TMM.2023.3347849

Vancouver

Li M, Fu H, He S, Fan H, Liu J, Keppo J et al. DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition. IEEE Transactions on Multimedia. 2024 Dec 31;26:6297-6309. Epub 2023 Dec 28. doi: 10.1109/TMM.2023.3347849

Author

Li, Ming ; Fu, Huazhu ; He, Shengfeng et al. / DR-FER : Discriminative and Robust Representation Learning for Facial Expression Recognition. In: IEEE Transactions on Multimedia. 2024 ; Vol. 26. pp. 6297-6309.

Bibtex

@article{5bb9cae448a34f388a6be8e66e419ca5,
title = "DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition",
abstract = "Learning discriminative and robust representations is important for facial expression recognition (FER) due to subtly different emotional faces and their subjective annotations. Previous works usually address one representation solely because these two goals seem to be contradictory for optimization. Their performances inevitably suffer from challenges from the other representation. In this article, by considering this problem from two novel perspectives, we demonstrate that discriminative and robust representations can be learned in a unified approach, i.e., DR-FER, and mutually benefit each other. Moreover, we make it with the supervision from only original annotations. Specifically, to learn discriminative representations, we propose performing masked image modeling (MIM) as an auxiliary task to force our network to discover expression-related facial areas. This is the first attempt to employ MIM to explore discriminative patterns in a self-supervised manner. To extract robust representations, we present a category-aware self-paced learning schedule to mine high-quality annotated ( easy ) expressions and incorrectly annotated ( hard ) counterparts. We further introduce a retrieval similarity-based relabeling strategy to correct hard expression annotations, exploiting them more effectively. By enhancing the discrimination ability of the FER classifier as a bridge, these two learning goals significantly strengthen each other. Extensive experiments on several popular benchmarks demonstrate the superior performance of our DR-FER. Moreover, thorough visualizations and extra experiments on manually annotation-corrupted datasets show that our approach successfully accomplishes learning both discriminative and robust representations simultaneously.",
author = "Ming Li and Huazhu Fu and Shengfeng He and Hehe Fan and Jun Liu and Jussi Keppo and Shou, {Mike Zheng}",
year = "2024",
month = dec,
day = "31",
doi = "10.1109/TMM.2023.3347849",
language = "English",
volume = "26",
pages = "6297--6309",
journal = "IEEE Transactions on Multimedia",
issn = "1520-9210",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

RIS

TY - JOUR

T1 - DR-FER

T2 - Discriminative and Robust Representation Learning for Facial Expression Recognition

AU - Li, Ming

AU - Fu, Huazhu

AU - He, Shengfeng

AU - Fan, Hehe

AU - Liu, Jun

AU - Keppo, Jussi

AU - Shou, Mike Zheng

PY - 2024/12/31

Y1 - 2024/12/31

N2 - Learning discriminative and robust representations is important for facial expression recognition (FER) due to subtly different emotional faces and their subjective annotations. Previous works usually address one representation solely because these two goals seem to be contradictory for optimization. Their performances inevitably suffer from challenges from the other representation. In this article, by considering this problem from two novel perspectives, we demonstrate that discriminative and robust representations can be learned in a unified approach, i.e., DR-FER, and mutually benefit each other. Moreover, we make it with the supervision from only original annotations. Specifically, to learn discriminative representations, we propose performing masked image modeling (MIM) as an auxiliary task to force our network to discover expression-related facial areas. This is the first attempt to employ MIM to explore discriminative patterns in a self-supervised manner. To extract robust representations, we present a category-aware self-paced learning schedule to mine high-quality annotated ( easy ) expressions and incorrectly annotated ( hard ) counterparts. We further introduce a retrieval similarity-based relabeling strategy to correct hard expression annotations, exploiting them more effectively. By enhancing the discrimination ability of the FER classifier as a bridge, these two learning goals significantly strengthen each other. Extensive experiments on several popular benchmarks demonstrate the superior performance of our DR-FER. Moreover, thorough visualizations and extra experiments on manually annotation-corrupted datasets show that our approach successfully accomplishes learning both discriminative and robust representations simultaneously.

AB - Learning discriminative and robust representations is important for facial expression recognition (FER) due to subtly different emotional faces and their subjective annotations. Previous works usually address one representation solely because these two goals seem to be contradictory for optimization. Their performances inevitably suffer from challenges from the other representation. In this article, by considering this problem from two novel perspectives, we demonstrate that discriminative and robust representations can be learned in a unified approach, i.e., DR-FER, and mutually benefit each other. Moreover, we make it with the supervision from only original annotations. Specifically, to learn discriminative representations, we propose performing masked image modeling (MIM) as an auxiliary task to force our network to discover expression-related facial areas. This is the first attempt to employ MIM to explore discriminative patterns in a self-supervised manner. To extract robust representations, we present a category-aware self-paced learning schedule to mine high-quality annotated ( easy ) expressions and incorrectly annotated ( hard ) counterparts. We further introduce a retrieval similarity-based relabeling strategy to correct hard expression annotations, exploiting them more effectively. By enhancing the discrimination ability of the FER classifier as a bridge, these two learning goals significantly strengthen each other. Extensive experiments on several popular benchmarks demonstrate the superior performance of our DR-FER. Moreover, thorough visualizations and extra experiments on manually annotation-corrupted datasets show that our approach successfully accomplishes learning both discriminative and robust representations simultaneously.

U2 - 10.1109/TMM.2023.3347849

DO - 10.1109/TMM.2023.3347849

M3 - Journal article

VL - 26

SP - 6297

EP - 6309

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

SN - 1520-9210

ER -