Arbitrary View Action Recognition via Transfer Dictionary Learning on Synthetic Training Data

Home > Research > Publications & Outputs > Arbitrary View Action Recognition via Transfer ...

Computing and Communications

Electronic data

Arbitrary Action_TIP
Rights statement: ©2018 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Accepted author manuscript, 8.07 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Text available via DOI:

https://doi.org/10.1109/TIP.2018.2836323
Final published version
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Jingtian Zhang
Hubert Shum
Jungong Han
Ling Shao

More...

<mark>Journal publication date</mark>	10/2018
<mark>Journal</mark>	IEEE Transactions on Image Processing
Issue number	10
Volume	27
Number of pages	15
Pages (from-to)	4709-4723
Publication Status	Published
Early online date	15/05/18
<mark>Original language</mark>	English

Abstract

Human action recognition is crucial to many practical applications, ranging from human-computer interaction to video surveillance. Most approaches either recognize the human action from a fixed view or require the knowledge of view angle, which is usually not available in practical applications. In this paper, we propose a novel end-to-end framework to jointly learn a view-invariance transfer dictionary and a view-invariant classifier. The result of the process is a dictionary that can project
real-world 2D video into a view-invariant sparse representation, as well as a classifier to recognize actions with an arbitrary view.

The main feature of our algorithm is the use of synthetic data to extract view-invariance between 3D and 2D videos during the pre-training phase. This guarantees the availability of training
data, and removes the hassle of obtaining real-world videos in specific viewing angles. Additionally, for better describing the actions in 3D videos, we introduce a new feature set called the 3D dense trajectories to effectively encode extracted trajectory information on 3D videos. Experimental results on the IXMAS, N-UCLA, i3DPost and UWA3DII datasets show improvements
over existing algorithms.

Bibliographic note

©2018 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

Research

Electronic data

Links

Text available via DOI: