Research output: Thesis › Doctoral Thesis
Research output: Thesis › Doctoral Thesis
}
TY - BOOK
T1 - Deep Neural Network Compression with Filter Pruning
AU - Zhang, Shuo
PY - 2023/6/13
Y1 - 2023/6/13
N2 - The rapid development of convolutional neural networks (CNNs) in computer vision tasks has inspired researchers to apply their potential to embedded or mobile devices. However, it typically requires a large amountof computation and memory footprint, limiting their deployment in thoseresource-limited systems. Therefore, how to compress complex networkswhile maintaining competitive performance has become the focus of attention in recent years. On the subject of network compression, filterpruning methods that achieve structured compact model by finding andremoving redundant filters, have attracted widespread attention. Inspiredby previous dedicated works, this thesis focuses on the way to obtain thecompact model while maximizing the retention of the original model performance. In particular, aiming at the limitations of choosing filters onthe existing popular pruning methods, several novel filter pruning strategies are proposed to find and remove redundant filters more accuratelyto reduce the performance loss of the model caused by pruning. For instance, the filter pruning method with an attention mechanism (Chapter3), data-dependent filter pruning guided by LSTM (Chapter 4), and filterpruning with uniqueness mechanism in the frequency domain (Chapter5).This thesis first addresses the filter pruning issue from a global perspective. To this end, we propose a new scheme, termed Pruning Filter withan Attention Mechanism (PFAM). That is, by establishing the dependency/relationship between filters at each layer, we explore the long-termdependence between filters via attention module in order to choose the tobe-pruned filters. Unlike prior approaches that identify the to-be-prunedfilters simply based on their intrinsic properties, the less correlated filtersare first pruned after the pruning step in the current training epoch andthen reconstructed and updated during the subsequent training epoch.Thus, the compressed network model can be achieved without the requirement for a pre-trained model since input data can be manipulated with the maximum information maintained when the original training strategyis executed.Next, it is noticed that most existing pruning algorithms seek to prune thefilter layer by layer. Specifically, they guide filter pruning at each layer bysetting a global pruning rate, which indicates that each convolutional layeris treated equally without regard to its depth and width. In this situation,we argue that the convolutional layers in the network also have varyingdegrees of significance. Besides, we propose that selecting the appropriatelayers for pruning is more reasonable since it can result in more complexityreduction with less performance loss by keeping and removing more filtersin those critical and nonsignificant layers, respectively. In order to do this,long short-term memory (LSTM) is employed to learn the hierarchicalproperties of a network and to generalize a global network pruning scheme.On top of that, we present a data-dependent soft pruning strategy namedSqueeze-Excitation-Pruning (SEP), which does not physically prune anyfilters but removes specific kernels involved in calculating forward andbackward propagations based on the pruning scheme. Doing so can furtherdecrease the model’s performance decline while achieving a deep modelcompression.Lastly, we transfer the concept of relationship from the filter level to thefeature map level because the feature maps can reflect the comprehensive information of both input data and filters. Hence, we propose FilterPruning with Uniqueness Mechanism in the Frequency Domain (FPUM)to serve as a guideline for the filter pruning strategy by generating thecorrelation between feature maps. Specifically, we first transfer featuresto the frequency domain by Discrete Cosine Transform (DCT). Then, foreach feature map, we compute a uniqueness score, which measures itsprobability of being replaced by others. Doing so allows us to prune thefilters corresponding to the low-uniqueness maps without significant performance degradation. In addition, our strategy is more resistant to noisethan spatial methods, further enhancing the network’s compactness whilemaintaining performance, as the critical pruning clues are more concentrated following DCT.
AB - The rapid development of convolutional neural networks (CNNs) in computer vision tasks has inspired researchers to apply their potential to embedded or mobile devices. However, it typically requires a large amountof computation and memory footprint, limiting their deployment in thoseresource-limited systems. Therefore, how to compress complex networkswhile maintaining competitive performance has become the focus of attention in recent years. On the subject of network compression, filterpruning methods that achieve structured compact model by finding andremoving redundant filters, have attracted widespread attention. Inspiredby previous dedicated works, this thesis focuses on the way to obtain thecompact model while maximizing the retention of the original model performance. In particular, aiming at the limitations of choosing filters onthe existing popular pruning methods, several novel filter pruning strategies are proposed to find and remove redundant filters more accuratelyto reduce the performance loss of the model caused by pruning. For instance, the filter pruning method with an attention mechanism (Chapter3), data-dependent filter pruning guided by LSTM (Chapter 4), and filterpruning with uniqueness mechanism in the frequency domain (Chapter5).This thesis first addresses the filter pruning issue from a global perspective. To this end, we propose a new scheme, termed Pruning Filter withan Attention Mechanism (PFAM). That is, by establishing the dependency/relationship between filters at each layer, we explore the long-termdependence between filters via attention module in order to choose the tobe-pruned filters. Unlike prior approaches that identify the to-be-prunedfilters simply based on their intrinsic properties, the less correlated filtersare first pruned after the pruning step in the current training epoch andthen reconstructed and updated during the subsequent training epoch.Thus, the compressed network model can be achieved without the requirement for a pre-trained model since input data can be manipulated with the maximum information maintained when the original training strategyis executed.Next, it is noticed that most existing pruning algorithms seek to prune thefilter layer by layer. Specifically, they guide filter pruning at each layer bysetting a global pruning rate, which indicates that each convolutional layeris treated equally without regard to its depth and width. In this situation,we argue that the convolutional layers in the network also have varyingdegrees of significance. Besides, we propose that selecting the appropriatelayers for pruning is more reasonable since it can result in more complexityreduction with less performance loss by keeping and removing more filtersin those critical and nonsignificant layers, respectively. In order to do this,long short-term memory (LSTM) is employed to learn the hierarchicalproperties of a network and to generalize a global network pruning scheme.On top of that, we present a data-dependent soft pruning strategy namedSqueeze-Excitation-Pruning (SEP), which does not physically prune anyfilters but removes specific kernels involved in calculating forward andbackward propagations based on the pruning scheme. Doing so can furtherdecrease the model’s performance decline while achieving a deep modelcompression.Lastly, we transfer the concept of relationship from the filter level to thefeature map level because the feature maps can reflect the comprehensive information of both input data and filters. Hence, we propose FilterPruning with Uniqueness Mechanism in the Frequency Domain (FPUM)to serve as a guideline for the filter pruning strategy by generating thecorrelation between feature maps. Specifically, we first transfer featuresto the frequency domain by Discrete Cosine Transform (DCT). Then, foreach feature map, we compute a uniqueness score, which measures itsprobability of being replaced by others. Doing so allows us to prune thefilters corresponding to the low-uniqueness maps without significant performance degradation. In addition, our strategy is more resistant to noisethan spatial methods, further enhancing the network’s compactness whilemaintaining performance, as the critical pruning clues are more concentrated following DCT.
U2 - 10.17635/lancaster/thesis/2009
DO - 10.17635/lancaster/thesis/2009
M3 - Doctoral Thesis
PB - Lancaster University
ER -