Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

Computing and Communications

Associated organisational units

Electronic data

ECCV2022_Lingeng_LiTianjiao_rgb_finegrained
Accepted author manuscript, 795 KB, PDF document

View graph of relations

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

E-pub ahead of print

Hossein Rahmani

More...

Publication date	27/10/2022
Host publication	Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition
Publisher	Springer
<mark>Original language</mark>	English

Publication series

Name	European Conference on Computer Vision (ECCV)

Abstract

The goal of fine-grained action recognition is to successfully
discriminate between action categories with subtle differences. To tackle
this, we derive inspiration from the human visual system which contains
specialized regions in the brain that are dedicated towards handling specific
tasks. We design a novel Dynamic Spatio-Temporal Specialization
(DSTS) module, which consists of specialized neurons that are only activated
for a subset of samples that are highly similar. During training,
the loss forces the specialized neurons to learn discriminative fine-grained
differences to distinguish between these similar samples, improving finegrained
recognition. Moreover, a spatio-temporal specialization method
further optimizes the architectures of the specialized neurons to capture
either more spatial or temporal fine-grained information, to better
tackle the large range of spatio-temporal variations in the videos. Lastly,
we design an Upstream-Downstream Learning algorithm to optimize our
model’s dynamic decisions during training, improving the performance
of our DSTS module. We obtain state-of-the-art performance on two
widely-used fine-grained action recognition datasets.

Research

Associated organisational units

Electronic data