Home > Research > Publications & Outputs > Dynamic Spatio-Temporal Specialization Learning...

Electronic data

  • ECCV2022_Lingeng_LiTianjiao_rgb_finegrained

    Accepted author manuscript, 795 KB, PDF document

View graph of relations

Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

E-pub ahead of print
Publication date27/10/2022
Host publicationDynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition
<mark>Original language</mark>English

Publication series

NameEuropean Conference on Computer Vision (ECCV)


The goal of fine-grained action recognition is to successfully
discriminate between action categories with subtle differences. To tackle
this, we derive inspiration from the human visual system which contains
specialized regions in the brain that are dedicated towards handling specific
tasks. We design a novel Dynamic Spatio-Temporal Specialization
(DSTS) module, which consists of specialized neurons that are only activated
for a subset of samples that are highly similar. During training,
the loss forces the specialized neurons to learn discriminative fine-grained
differences to distinguish between these similar samples, improving finegrained
recognition. Moreover, a spatio-temporal specialization method
further optimizes the architectures of the specialized neurons to capture
either more spatial or temporal fine-grained information, to better
tackle the large range of spatio-temporal variations in the videos. Lastly,
we design an Upstream-Downstream Learning algorithm to optimize our
model’s dynamic decisions during training, improving the performance
of our DSTS module. We obtain state-of-the-art performance on two
widely-used fine-grained action recognition datasets.