ST-CNN: Spatial-Temporal Convolutional Neural Network for crowd counting in videos

Home > Research > Publications & Outputs > ST-CNN: Spatial-Temporal Convolutional Neural N...

Computing and Communications

Text available via DOI:

https://doi.org/10.1016/j.patrec.2019.04.012
Final published version

Keywords

Crowd analysis, Crowd counting, Spatio-temporal feature, Convolution, Deep learning, Mean square error, Convolutional neural network, Learning techniques, Perspective distortion, Spatial-temporal features, Spatio temporal features, Temporal correlations, Neural networks

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Y. Miao
J. Han
Y. Gao
B. Zhang

More...

<mark>Journal publication date</mark>	1/07/2019
<mark>Journal</mark>	Pattern Recognition Letters
Volume	125
Number of pages	6
Pages (from-to)	113-118
Publication Status	Published
Early online date	16/04/19
<mark>Original language</mark>	English

Abstract

The task of crowd counting and density maps estimating from videos is challenging due to severe occlusions, scene perspective distortions and diverse crowd distributions. Conventional crowd counting methods via deep learning technique process each video frame independently with no consideration of the intrinsic temporal correlation among neighboring frames, thus making the performance lower than the required level of real-world applications. To overcome this shortcoming, a new end-to-end deep architecture named Spatial-Temporal Convolutional Neural Network (ST-CNN) is proposed, which unifies 2D convolutional neural network (C2D) and 3D convolutional neural network (C3D) to learn spatial-temporal features in the same framework. On top of that, a merging scheme is performed on the resulting density maps, taking advantages of the spatial-temporal information simultaneously for the crowd counting task. Experimental results on two benchmark data sets â Mall dataset and WorldExpo′10 dataset show that our ST-CNN outperforms the state-of-the-art models in terms of mean absolutely error (MAE) and mean squared error (MSE).

Research

Links

Text available via DOI:

Keywords

ST-CNN: Spatial-Temporal Convolutional Neural Network for crowd counting in videos

Abstract

Quick Links

Connect With Us

Faculties & Depts

Contact Us