Accepted author manuscript, 795 KB, PDF document
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition
AU - Rahmani, Hossein
PY - 2022/10/27
Y1 - 2022/10/27
N2 - The goal of fine-grained action recognition is to successfullydiscriminate between action categories with subtle differences. To tacklethis, we derive inspiration from the human visual system which containsspecialized regions in the brain that are dedicated towards handling specifictasks. We design a novel Dynamic Spatio-Temporal Specialization(DSTS) module, which consists of specialized neurons that are only activatedfor a subset of samples that are highly similar. During training,the loss forces the specialized neurons to learn discriminative fine-graineddifferences to distinguish between these similar samples, improving finegrainedrecognition. Moreover, a spatio-temporal specialization methodfurther optimizes the architectures of the specialized neurons to captureeither more spatial or temporal fine-grained information, to bettertackle the large range of spatio-temporal variations in the videos. Lastly,we design an Upstream-Downstream Learning algorithm to optimize ourmodel’s dynamic decisions during training, improving the performanceof our DSTS module. We obtain state-of-the-art performance on twowidely-used fine-grained action recognition datasets.
AB - The goal of fine-grained action recognition is to successfullydiscriminate between action categories with subtle differences. To tacklethis, we derive inspiration from the human visual system which containsspecialized regions in the brain that are dedicated towards handling specifictasks. We design a novel Dynamic Spatio-Temporal Specialization(DSTS) module, which consists of specialized neurons that are only activatedfor a subset of samples that are highly similar. During training,the loss forces the specialized neurons to learn discriminative fine-graineddifferences to distinguish between these similar samples, improving finegrainedrecognition. Moreover, a spatio-temporal specialization methodfurther optimizes the architectures of the specialized neurons to captureeither more spatial or temporal fine-grained information, to bettertackle the large range of spatio-temporal variations in the videos. Lastly,we design an Upstream-Downstream Learning algorithm to optimize ourmodel’s dynamic decisions during training, improving the performanceof our DSTS module. We obtain state-of-the-art performance on twowidely-used fine-grained action recognition datasets.
M3 - Conference contribution/Paper
T3 - European Conference on Computer Vision (ECCV)
BT - Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition
PB - Springer
ER -