Home > Research > Publications & Outputs > Action Detection via an Image Diffusion Process

Electronic data

Keywords

View graph of relations

Action Detection via an Image Diffusion Process

Research output: Working paperPreprint

Published

Standard

Action Detection via an Image Diffusion Process. / Foo, Lin Geng; Li, Tianjiao; Rahmani, Hossein et al.
2024.

Research output: Working paperPreprint

Harvard

APA

Vancouver

Author

Foo, Lin Geng ; Li, Tianjiao ; Rahmani, Hossein et al. / Action Detection via an Image Diffusion Process. 2024.

Bibtex

@techreport{913dbad86b7148a8ad4935a74659d488,
title = "Action Detection via an Image Diffusion Process",
abstract = "Action detection aims to localize the starting and ending points of action instances in untrimmed videos, and predict the classes of those instances. In this paper, we make the observation that the outputs of the action detection task can be formulated as images. Thus, from a novel perspective, we tackle action detection via a three-image generation process to generate starting point, ending point and action-class predictions as images via our proposed Action Detection Image Diffusion (ADI-Diff) framework. Furthermore, since our images differ from natural images and exhibit special properties, we further explore a Discrete Action-Detection Diffusion Process and a Row-Column Transformer design to better handle their processing. Our ADI-Diff framework achieves state-of-the-art results on two widely-used datasets.",
keywords = "cs.CV",
author = "Foo, {Lin Geng} and Tianjiao Li and Hossein Rahmani and Jun Liu",
note = "Accepted to CVPR 2024",
year = "2024",
month = apr,
day = "1",
language = "English",
type = "WorkingPaper",

}

RIS

TY - UNPB

T1 - Action Detection via an Image Diffusion Process

AU - Foo, Lin Geng

AU - Li, Tianjiao

AU - Rahmani, Hossein

AU - Liu, Jun

N1 - Accepted to CVPR 2024

PY - 2024/4/1

Y1 - 2024/4/1

N2 - Action detection aims to localize the starting and ending points of action instances in untrimmed videos, and predict the classes of those instances. In this paper, we make the observation that the outputs of the action detection task can be formulated as images. Thus, from a novel perspective, we tackle action detection via a three-image generation process to generate starting point, ending point and action-class predictions as images via our proposed Action Detection Image Diffusion (ADI-Diff) framework. Furthermore, since our images differ from natural images and exhibit special properties, we further explore a Discrete Action-Detection Diffusion Process and a Row-Column Transformer design to better handle their processing. Our ADI-Diff framework achieves state-of-the-art results on two widely-used datasets.

AB - Action detection aims to localize the starting and ending points of action instances in untrimmed videos, and predict the classes of those instances. In this paper, we make the observation that the outputs of the action detection task can be formulated as images. Thus, from a novel perspective, we tackle action detection via a three-image generation process to generate starting point, ending point and action-class predictions as images via our proposed Action Detection Image Diffusion (ADI-Diff) framework. Furthermore, since our images differ from natural images and exhibit special properties, we further explore a Discrete Action-Detection Diffusion Process and a Row-Column Transformer design to better handle their processing. Our ADI-Diff framework achieves state-of-the-art results on two widely-used datasets.

KW - cs.CV

M3 - Preprint

BT - Action Detection via an Image Diffusion Process

ER -