Home > Research > Publications & Outputs > Arbitrary View Action Recognition via Transfer ...

Electronic data

  • Arbitrary Action_TIP

    Rights statement: ©2018 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

    Accepted author manuscript, 8.07 MB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

Text available via DOI:

View graph of relations

Arbitrary View Action Recognition via Transfer Dictionary Learning on Synthetic Training Data

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published
Close
<mark>Journal publication date</mark>10/2018
<mark>Journal</mark>IEEE Transactions on Image Processing
Issue number10
Volume27
Number of pages15
Pages (from-to)4709-4723
Publication StatusPublished
Early online date15/05/18
<mark>Original language</mark>English

Abstract

Human action recognition is crucial to many practical applications, ranging from human-computer interaction to video surveillance. Most approaches either recognize the human action from a fixed view or require the knowledge of view angle, which is usually not available in practical applications. In this paper, we propose a novel end-to-end framework to jointly learn a view-invariance transfer dictionary and a view-invariant classifier. The result of the process is a dictionary that can project
real-world 2D video into a view-invariant sparse representation, as well as a classifier to recognize actions with an arbitrary view.

The main feature of our algorithm is the use of synthetic data to extract view-invariance between 3D and 2D videos during the pre-training phase. This guarantees the availability of training
data, and removes the hassle of obtaining real-world videos in specific viewing angles. Additionally, for better describing the actions in 3D videos, we introduce a new feature set called the 3D dense trajectories to effectively encode extracted trajectory information on 3D videos. Experimental results on the IXMAS, N-UCLA, i3DPost and UWA3DII datasets show improvements
over existing algorithms.

Bibliographic note

©2018 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.