Precise facial landmark detection by Dynamic Semantic Aggregation Transformer

Computing and Communications

Text available via DOI:

https://doi.org/10.1016/j.patcog.2024.110827
Final published version

Keywords

Dynamic network, Facial landmark detection, Heatmap regression, Heavy occlusions, Multi-scale feature

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Precise facial landmark detection by Dynamic Semantic Aggregation Transformer. / Wan, Jun; Liu, He; Wu, Yujia et al.
In: Pattern Recognition, Vol. 156, 110827, 31.12.2024.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Wan, J, Liu, H, Wu, Y, Lai, Z, Min, W & Liu, J 2024, 'Precise facial landmark detection by Dynamic Semantic Aggregation Transformer', Pattern Recognition, vol. 156, 110827. https://doi.org/10.1016/j.patcog.2024.110827

APA

Wan, J., Liu, H., Wu, Y., Lai, Z., Min, W., & Liu, J. (2024). Precise facial landmark detection by Dynamic Semantic Aggregation Transformer. Pattern Recognition, 156, Article 110827. https://doi.org/10.1016/j.patcog.2024.110827

Vancouver

Wan J, Liu H, Wu Y, Lai Z, Min W, Liu J. Precise facial landmark detection by Dynamic Semantic Aggregation Transformer. Pattern Recognition. 2024 Dec 31;156:110827. Epub 2024 Jul 30. doi: 10.1016/j.patcog.2024.110827

Author

Wan, Jun ; Liu, He ; Wu, Yujia et al. / Precise facial landmark detection by Dynamic Semantic Aggregation Transformer. In: Pattern Recognition. 2024 ; Vol. 156.

Bibtex

@article{52515b5618524ecc8ca5d95725035149,

title = "Precise facial landmark detection by Dynamic Semantic Aggregation Transformer",

abstract = "At present, deep neural network methods have played a dominant role in face alignment field. However, they generally use predefined network structures to predict landmarks, which tends to learn general features and leads to mediocre performance, e.g., they perform well on neutral samples but struggle with faces exhibiting large poses or occlusions. Moreover, they cannot effectively deal with semantic gaps and ambiguities among features at different scales, which may hinder them from learning efficient features. To address the above issues, in this paper, we propose a Dynamic Semantic-Aggregation Transformer (DSAT) for more discriminative and representative feature (i.e., specialized feature) learning. Specifically, a Dynamic Semantic-Aware (DSA) model is first proposed to partition samples into subsets and activate the specific pathways for them by estimating the semantic correlations of feature channels, making it possible to learn specialized features from each subset. Then, a novel Dynamic Semantic Specialization (DSS) model is designed to mine the homogeneous information from features at different scales for eliminating the semantic gap and ambiguities and enhancing the representation ability. Finally, by integrating the DSA model and DSS model into our proposed DSAT in both dynamic architecture and dynamic parameter manners, more specialized features can be learned for achieving more precise face alignment. It is interesting to show that harder samples can be handled by activating more feature channels. Extensive experiments on popular face alignment datasets demonstrate that our proposed DSAT outperforms state-of-the-art models in the literature. Our code is available at https://github.com/GERMINO-LiuHe/DSAT.",

keywords = "Dynamic network, Facial landmark detection, Heatmap regression, Heavy occlusions, Multi-scale feature",

author = "Jun Wan and He Liu and Yujia Wu and Zhihui Lai and Wenwen Min and Jun Liu",

year = "2024",

month = dec,

day = "31",

doi = "10.1016/j.patcog.2024.110827",

language = "English",

volume = "156",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd",

}

RIS

TY - JOUR

T1 - Precise facial landmark detection by Dynamic Semantic Aggregation Transformer

AU - Wan, Jun

AU - Liu, He

AU - Wu, Yujia

AU - Lai, Zhihui

AU - Min, Wenwen

AU - Liu, Jun

PY - 2024/12/31

Y1 - 2024/12/31

N2 - At present, deep neural network methods have played a dominant role in face alignment field. However, they generally use predefined network structures to predict landmarks, which tends to learn general features and leads to mediocre performance, e.g., they perform well on neutral samples but struggle with faces exhibiting large poses or occlusions. Moreover, they cannot effectively deal with semantic gaps and ambiguities among features at different scales, which may hinder them from learning efficient features. To address the above issues, in this paper, we propose a Dynamic Semantic-Aggregation Transformer (DSAT) for more discriminative and representative feature (i.e., specialized feature) learning. Specifically, a Dynamic Semantic-Aware (DSA) model is first proposed to partition samples into subsets and activate the specific pathways for them by estimating the semantic correlations of feature channels, making it possible to learn specialized features from each subset. Then, a novel Dynamic Semantic Specialization (DSS) model is designed to mine the homogeneous information from features at different scales for eliminating the semantic gap and ambiguities and enhancing the representation ability. Finally, by integrating the DSA model and DSS model into our proposed DSAT in both dynamic architecture and dynamic parameter manners, more specialized features can be learned for achieving more precise face alignment. It is interesting to show that harder samples can be handled by activating more feature channels. Extensive experiments on popular face alignment datasets demonstrate that our proposed DSAT outperforms state-of-the-art models in the literature. Our code is available at https://github.com/GERMINO-LiuHe/DSAT.

AB - At present, deep neural network methods have played a dominant role in face alignment field. However, they generally use predefined network structures to predict landmarks, which tends to learn general features and leads to mediocre performance, e.g., they perform well on neutral samples but struggle with faces exhibiting large poses or occlusions. Moreover, they cannot effectively deal with semantic gaps and ambiguities among features at different scales, which may hinder them from learning efficient features. To address the above issues, in this paper, we propose a Dynamic Semantic-Aggregation Transformer (DSAT) for more discriminative and representative feature (i.e., specialized feature) learning. Specifically, a Dynamic Semantic-Aware (DSA) model is first proposed to partition samples into subsets and activate the specific pathways for them by estimating the semantic correlations of feature channels, making it possible to learn specialized features from each subset. Then, a novel Dynamic Semantic Specialization (DSS) model is designed to mine the homogeneous information from features at different scales for eliminating the semantic gap and ambiguities and enhancing the representation ability. Finally, by integrating the DSA model and DSS model into our proposed DSAT in both dynamic architecture and dynamic parameter manners, more specialized features can be learned for achieving more precise face alignment. It is interesting to show that harder samples can be handled by activating more feature channels. Extensive experiments on popular face alignment datasets demonstrate that our proposed DSAT outperforms state-of-the-art models in the literature. Our code is available at https://github.com/GERMINO-LiuHe/DSAT.

KW - Dynamic network

KW - Facial landmark detection

KW - Heatmap regression

KW - Heavy occlusions

KW - Multi-scale feature

U2 - 10.1016/j.patcog.2024.110827

DO - 10.1016/j.patcog.2024.110827

M3 - Journal article

AN - SCOPUS:85199989820

VL - 156

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

M1 - 110827

ER -

Research

Links

Text available via DOI:

Keywords