Home > Research > Publications & Outputs > Frequency Decoupled Masked Auto-Encoder for Sel...

Electronic data

  • paper

    Accepted author manuscript, 846 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

Text available via DOI:

View graph of relations

Frequency Decoupled Masked Auto-Encoder for Self-Supervised Skeleton-Based Action Recognition

Research output: Contribution to Journal/MagazineJournal articlepeer-review

E-pub ahead of print

Standard

Frequency Decoupled Masked Auto-Encoder for Self-Supervised Skeleton-Based Action Recognition. / Liu, Ye; Shi, Tianhao; Zhai, Mingliang et al.
In: IEEE Signal Processing Letters, 31.12.2025, p. 546-550.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Liu, Y., Shi, T., Zhai, M., & Liu, J. (2025). Frequency Decoupled Masked Auto-Encoder for Self-Supervised Skeleton-Based Action Recognition. IEEE Signal Processing Letters, 546-550. Advance online publication. https://doi.org/10.1109/lsp.2024.3525398

Vancouver

Liu Y, Shi T, Zhai M, Liu J. Frequency Decoupled Masked Auto-Encoder for Self-Supervised Skeleton-Based Action Recognition. IEEE Signal Processing Letters. 2025 Dec 31;546-550. Epub 2025 Jan 3. doi: 10.1109/lsp.2024.3525398

Author

Liu, Ye ; Shi, Tianhao ; Zhai, Mingliang et al. / Frequency Decoupled Masked Auto-Encoder for Self-Supervised Skeleton-Based Action Recognition. In: IEEE Signal Processing Letters. 2025 ; pp. 546-550.

Bibtex

@article{94d30566eff94a38ad47c94d428f0035,
title = "Frequency Decoupled Masked Auto-Encoder for Self-Supervised Skeleton-Based Action Recognition",
abstract = "In 3D skeleton-based action recognition, the limited availability of supervised data has driven interest in self-supervised learning methods. The reconstruction paradigm using masked auto-encoder (MAE) is an effective and mainstream self-supervised learning approach. However, recent studies indicate that MAE models tend to focus on features within a certain frequency range, which may result in the loss of important information. To address this issue, we propose a frequency decoupled MAE. Specifically, by incorporating a scale-specific frequency feature reconstruction module, we delve into leveraging frequency information as a direct and explicit target for reconstruction, which augments the MAE's capability to discern and accurately reproduce diverse frequency attributes within the data. Moreover, in order to address the issue of unstable gradient updates caused by more complex optimization objectives with frequency reconstruction, we introduce a dual-path network combined with an exponential moving average (EMA) parameter updating strategy to guide the model in stabilizing the training process. We have conducted extensive experiments which have demonstrated the effectiveness of the proposed method.",
author = "Ye Liu and Tianhao Shi and Mingliang Zhai and Jun Liu",
year = "2025",
month = jan,
day = "3",
doi = "10.1109/lsp.2024.3525398",
language = "English",
pages = "546--550",
journal = "IEEE Signal Processing Letters",
issn = "1070-9908",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

RIS

TY - JOUR

T1 - Frequency Decoupled Masked Auto-Encoder for Self-Supervised Skeleton-Based Action Recognition

AU - Liu, Ye

AU - Shi, Tianhao

AU - Zhai, Mingliang

AU - Liu, Jun

PY - 2025/1/3

Y1 - 2025/1/3

N2 - In 3D skeleton-based action recognition, the limited availability of supervised data has driven interest in self-supervised learning methods. The reconstruction paradigm using masked auto-encoder (MAE) is an effective and mainstream self-supervised learning approach. However, recent studies indicate that MAE models tend to focus on features within a certain frequency range, which may result in the loss of important information. To address this issue, we propose a frequency decoupled MAE. Specifically, by incorporating a scale-specific frequency feature reconstruction module, we delve into leveraging frequency information as a direct and explicit target for reconstruction, which augments the MAE's capability to discern and accurately reproduce diverse frequency attributes within the data. Moreover, in order to address the issue of unstable gradient updates caused by more complex optimization objectives with frequency reconstruction, we introduce a dual-path network combined with an exponential moving average (EMA) parameter updating strategy to guide the model in stabilizing the training process. We have conducted extensive experiments which have demonstrated the effectiveness of the proposed method.

AB - In 3D skeleton-based action recognition, the limited availability of supervised data has driven interest in self-supervised learning methods. The reconstruction paradigm using masked auto-encoder (MAE) is an effective and mainstream self-supervised learning approach. However, recent studies indicate that MAE models tend to focus on features within a certain frequency range, which may result in the loss of important information. To address this issue, we propose a frequency decoupled MAE. Specifically, by incorporating a scale-specific frequency feature reconstruction module, we delve into leveraging frequency information as a direct and explicit target for reconstruction, which augments the MAE's capability to discern and accurately reproduce diverse frequency attributes within the data. Moreover, in order to address the issue of unstable gradient updates caused by more complex optimization objectives with frequency reconstruction, we introduce a dual-path network combined with an exponential moving average (EMA) parameter updating strategy to guide the model in stabilizing the training process. We have conducted extensive experiments which have demonstrated the effectiveness of the proposed method.

U2 - 10.1109/lsp.2024.3525398

DO - 10.1109/lsp.2024.3525398

M3 - Journal article

SP - 546

EP - 550

JO - IEEE Signal Processing Letters

JF - IEEE Signal Processing Letters

SN - 1070-9908

ER -