The rapid development of convolutional neural networks (CNNs) in computer vision tasks has inspired researchers to apply their potential to embedded or mobile devices. However, it typically requires a large amount
of computation and memory footprint, limiting their deployment in those
resource-limited systems. Therefore, how to compress complex networks
while maintaining competitive performance has become the focus of attention in recent years. On the subject of network compression, filter
pruning methods that achieve structured compact model by finding and
removing redundant filters, have attracted widespread attention. Inspired
by previous dedicated works, this thesis focuses on the way to obtain the
compact model while maximizing the retention of the original model performance. In particular, aiming at the limitations of choosing filters on
the existing popular pruning methods, several novel filter pruning strategies are proposed to find and remove redundant filters more accurately
to reduce the performance loss of the model caused by pruning. For instance, the filter pruning method with an attention mechanism (Chapter
3), data-dependent filter pruning guided by LSTM (Chapter 4), and filter
pruning with uniqueness mechanism in the frequency domain (Chapter
5).
This thesis first addresses the filter pruning issue from a global perspective. To this end, we propose a new scheme, termed Pruning Filter with
an Attention Mechanism (PFAM). That is, by establishing the dependency/relationship between filters at each layer, we explore the long-term
dependence between filters via attention module in order to choose the tobe-pruned filters. Unlike prior approaches that identify the to-be-pruned
filters simply based on their intrinsic properties, the less correlated filters
are first pruned after the pruning step in the current training epoch and
then reconstructed and updated during the subsequent training epoch.
Thus, the compressed network model can be achieved without the requirement for a pre-trained model since input data can be manipulated with the maximum information maintained when the original training strategy
is executed.
Next, it is noticed that most existing pruning algorithms seek to prune the
filter layer by layer. Specifically, they guide filter pruning at each layer by
setting a global pruning rate, which indicates that each convolutional layer
is treated equally without regard to its depth and width. In this situation,
we argue that the convolutional layers in the network also have varying
degrees of significance. Besides, we propose that selecting the appropriate
layers for pruning is more reasonable since it can result in more complexity
reduction with less performance loss by keeping and removing more filters
in those critical and nonsignificant layers, respectively. In order to do this,
long short-term memory (LSTM) is employed to learn the hierarchical
properties of a network and to generalize a global network pruning scheme.
On top of that, we present a data-dependent soft pruning strategy named
Squeeze-Excitation-Pruning (SEP), which does not physically prune any
filters but removes specific kernels involved in calculating forward and
backward propagations based on the pruning scheme. Doing so can further
decrease the model’s performance decline while achieving a deep model
compression.
Lastly, we transfer the concept of relationship from the filter level to the
feature map level because the feature maps can reflect the comprehensive information of both input data and filters. Hence, we propose Filter
Pruning with Uniqueness Mechanism in the Frequency Domain (FPUM)
to serve as a guideline for the filter pruning strategy by generating the
correlation between feature maps. Specifically, we first transfer features
to the frequency domain by Discrete Cosine Transform (DCT). Then, for
each feature map, we compute a uniqueness score, which measures its
probability of being replaced by others. Doing so allows us to prune the
filters corresponding to the low-uniqueness maps without significant performance degradation. In addition, our strategy is more resistant to noise
than spatial methods, further enhancing the network’s compactness while
maintaining performance, as the critical pruning clues are more concentrated following DCT.