Home > Research > Publications & Outputs > Dear-Net

Links

Text available via DOI:

View graph of relations

Dear-Net: Learning Diversities for Skeleton-Based Early Action Recognition

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Dear-Net: Learning Diversities for Skeleton-Based Early Action Recognition. / Wang, Rui; Liu, Jun; Ke, Qiuhong et al.
In: IEEE Transactions on Multimedia, Vol. 25, 31.12.2023, p. 1175-1189.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Wang, R, Liu, J, Ke, Q, Peng, D & Lei, Y 2023, 'Dear-Net: Learning Diversities for Skeleton-Based Early Action Recognition', IEEE Transactions on Multimedia, vol. 25, pp. 1175-1189. https://doi.org/10.1109/TMM.2021.3139768

APA

Wang, R., Liu, J., Ke, Q., Peng, D., & Lei, Y. (2023). Dear-Net: Learning Diversities for Skeleton-Based Early Action Recognition. IEEE Transactions on Multimedia, 25, 1175-1189. https://doi.org/10.1109/TMM.2021.3139768

Vancouver

Wang R, Liu J, Ke Q, Peng D, Lei Y. Dear-Net: Learning Diversities for Skeleton-Based Early Action Recognition. IEEE Transactions on Multimedia. 2023 Dec 31;25:1175-1189. Epub 2021 Dec 31. doi: 10.1109/TMM.2021.3139768

Author

Wang, Rui ; Liu, Jun ; Ke, Qiuhong et al. / Dear-Net : Learning Diversities for Skeleton-Based Early Action Recognition. In: IEEE Transactions on Multimedia. 2023 ; Vol. 25. pp. 1175-1189.

Bibtex

@article{3abf9ae929d248f9935737f8044e2560,
title = "Dear-Net: Learning Diversities for Skeleton-Based Early Action Recognition",
abstract = "Early actionrecognition, i.e., recognizing an action before it is fully performed, is a challenging and important task. Existing works mainly focus on deterministic early action recognition outputting only a single class, and ignore the uncertainty and diversity that essentially exist in this task. Intuitively, when only the early portion of the action is observed, there could be multiple possibilities of the full action, as diversified actions can share almost identical early segments in many scenarios. Thus taking uncertainties and diversities into account, and outputting multiple plausible predictions, instead of a single one, can be important for the sake of authenticity and requirement of many practical applications. To this end, we propose a novel Diversified Early Action Recognition Network (Dear-Net) that is capable of outputting multiple reasonable action classes for each partial sequence by utilizing mode conversion. Specifically, we introduce an effective action diversity learning strategy to drive our network towards predicting diverse and reasonable results, in which each learnable action class is matched with the most suitable mode. Meanwhile, the collapsed modes which fail to receive any action class, are also considered in this strategy in order to ensure diversity. Moreover, we design a sequence decoder within our network to capture latent global information for better early action recognition. It provides a feasible scheme for weakly-supervised setting in which the Dear-Net leverages unlabelled data to improve performance. Experimental results on three challenging datasets clearly show the effectiveness of our approach.",
author = "Rui Wang and Jun Liu and Qiuhong Ke and Duo Peng and Yinjie Lei",
year = "2023",
month = dec,
day = "31",
doi = "10.1109/TMM.2021.3139768",
language = "English",
volume = "25",
pages = "1175--1189",
journal = "IEEE Transactions on Multimedia",
issn = "1520-9210",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

RIS

TY - JOUR

T1 - Dear-Net

T2 - Learning Diversities for Skeleton-Based Early Action Recognition

AU - Wang, Rui

AU - Liu, Jun

AU - Ke, Qiuhong

AU - Peng, Duo

AU - Lei, Yinjie

PY - 2023/12/31

Y1 - 2023/12/31

N2 - Early actionrecognition, i.e., recognizing an action before it is fully performed, is a challenging and important task. Existing works mainly focus on deterministic early action recognition outputting only a single class, and ignore the uncertainty and diversity that essentially exist in this task. Intuitively, when only the early portion of the action is observed, there could be multiple possibilities of the full action, as diversified actions can share almost identical early segments in many scenarios. Thus taking uncertainties and diversities into account, and outputting multiple plausible predictions, instead of a single one, can be important for the sake of authenticity and requirement of many practical applications. To this end, we propose a novel Diversified Early Action Recognition Network (Dear-Net) that is capable of outputting multiple reasonable action classes for each partial sequence by utilizing mode conversion. Specifically, we introduce an effective action diversity learning strategy to drive our network towards predicting diverse and reasonable results, in which each learnable action class is matched with the most suitable mode. Meanwhile, the collapsed modes which fail to receive any action class, are also considered in this strategy in order to ensure diversity. Moreover, we design a sequence decoder within our network to capture latent global information for better early action recognition. It provides a feasible scheme for weakly-supervised setting in which the Dear-Net leverages unlabelled data to improve performance. Experimental results on three challenging datasets clearly show the effectiveness of our approach.

AB - Early actionrecognition, i.e., recognizing an action before it is fully performed, is a challenging and important task. Existing works mainly focus on deterministic early action recognition outputting only a single class, and ignore the uncertainty and diversity that essentially exist in this task. Intuitively, when only the early portion of the action is observed, there could be multiple possibilities of the full action, as diversified actions can share almost identical early segments in many scenarios. Thus taking uncertainties and diversities into account, and outputting multiple plausible predictions, instead of a single one, can be important for the sake of authenticity and requirement of many practical applications. To this end, we propose a novel Diversified Early Action Recognition Network (Dear-Net) that is capable of outputting multiple reasonable action classes for each partial sequence by utilizing mode conversion. Specifically, we introduce an effective action diversity learning strategy to drive our network towards predicting diverse and reasonable results, in which each learnable action class is matched with the most suitable mode. Meanwhile, the collapsed modes which fail to receive any action class, are also considered in this strategy in order to ensure diversity. Moreover, we design a sequence decoder within our network to capture latent global information for better early action recognition. It provides a feasible scheme for weakly-supervised setting in which the Dear-Net leverages unlabelled data to improve performance. Experimental results on three challenging datasets clearly show the effectiveness of our approach.

U2 - 10.1109/TMM.2021.3139768

DO - 10.1109/TMM.2021.3139768

M3 - Journal article

VL - 25

SP - 1175

EP - 1189

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

SN - 1520-9210

ER -