Deep Neural Network Compression with Filter Pruning

Computing and Communications

Electronic data

2023zhangphd
Final published version, 7.84 MB, PDF document

Text available via DOI:

https://doi.org/10.17635/lancaster/thesis/2009
Final published version

View graph of relations

Research output: Thesis › Doctoral Thesis

Published

Standard

Deep Neural Network Compression with Filter Pruning. / Zhang, Shuo.
Lancaster University, 2023. 154 p.

Research output: Thesis › Doctoral Thesis

Harvard

Zhang, S 2023, 'Deep Neural Network Compression with Filter Pruning', PhD, Lancaster University. https://doi.org/10.17635/lancaster/thesis/2009

APA

Zhang, S. (2023). Deep Neural Network Compression with Filter Pruning. [Doctoral Thesis, Lancaster University]. Lancaster University. https://doi.org/10.17635/lancaster/thesis/2009

Vancouver

Zhang S. Deep Neural Network Compression with Filter Pruning. Lancaster University, 2023. 154 p. doi: 10.17635/lancaster/thesis/2009

Author

Zhang, Shuo. / Deep Neural Network Compression with Filter Pruning. Lancaster University, 2023. 154 p.

Bibtex

@phdthesis{da530cb5d2944e71876e47c1abfe196e,

title = "Deep Neural Network Compression with Filter Pruning",

abstract = "The rapid development of convolutional neural networks (CNNs) in computer vision tasks has inspired researchers to apply their potential to embedded or mobile devices. However, it typically requires a large amountof computation and memory footprint, limiting their deployment in thoseresource-limited systems. Therefore, how to compress complex networkswhile maintaining competitive performance has become the focus of attention in recent years. On the subject of network compression, filterpruning methods that achieve structured compact model by finding andremoving redundant filters, have attracted widespread attention. Inspiredby previous dedicated works, this thesis focuses on the way to obtain thecompact model while maximizing the retention of the original model performance. In particular, aiming at the limitations of choosing filters onthe existing popular pruning methods, several novel filter pruning strategies are proposed to find and remove redundant filters more accuratelyto reduce the performance loss of the model caused by pruning. For instance, the filter pruning method with an attention mechanism (Chapter3), data-dependent filter pruning guided by LSTM (Chapter 4), and filterpruning with uniqueness mechanism in the frequency domain (Chapter5).This thesis first addresses the filter pruning issue from a global perspective. To this end, we propose a new scheme, termed Pruning Filter withan Attention Mechanism (PFAM). That is, by establishing the dependency/relationship between filters at each layer, we explore the long-termdependence between filters via attention module in order to choose the tobe-pruned filters. Unlike prior approaches that identify the to-be-prunedfilters simply based on their intrinsic properties, the less correlated filtersare first pruned after the pruning step in the current training epoch andthen reconstructed and updated during the subsequent training epoch.Thus, the compressed network model can be achieved without the requirement for a pre-trained model since input data can be manipulated with the maximum information maintained when the original training strategyis executed.Next, it is noticed that most existing pruning algorithms seek to prune thefilter layer by layer. Specifically, they guide filter pruning at each layer bysetting a global pruning rate, which indicates that each convolutional layeris treated equally without regard to its depth and width. In this situation,we argue that the convolutional layers in the network also have varyingdegrees of significance. Besides, we propose that selecting the appropriatelayers for pruning is more reasonable since it can result in more complexityreduction with less performance loss by keeping and removing more filtersin those critical and nonsignificant layers, respectively. In order to do this,long short-term memory (LSTM) is employed to learn the hierarchicalproperties of a network and to generalize a global network pruning scheme.On top of that, we present a data-dependent soft pruning strategy namedSqueeze-Excitation-Pruning (SEP), which does not physically prune anyfilters but removes specific kernels involved in calculating forward andbackward propagations based on the pruning scheme. Doing so can furtherdecrease the model{\textquoteright}s performance decline while achieving a deep modelcompression.Lastly, we transfer the concept of relationship from the filter level to thefeature map level because the feature maps can reflect the comprehensive information of both input data and filters. Hence, we propose FilterPruning with Uniqueness Mechanism in the Frequency Domain (FPUM)to serve as a guideline for the filter pruning strategy by generating thecorrelation between feature maps. Specifically, we first transfer featuresto the frequency domain by Discrete Cosine Transform (DCT). Then, foreach feature map, we compute a uniqueness score, which measures itsprobability of being replaced by others. Doing so allows us to prune thefilters corresponding to the low-uniqueness maps without significant performance degradation. In addition, our strategy is more resistant to noisethan spatial methods, further enhancing the network{\textquoteright}s compactness whilemaintaining performance, as the critical pruning clues are more concentrated following DCT.",

author = "Shuo Zhang",

year = "2023",

month = jun,

day = "13",

doi = "10.17635/lancaster/thesis/2009",

language = "English",

publisher = "Lancaster University",

school = "Lancaster University",

}

RIS

TY - BOOK

T1 - Deep Neural Network Compression with Filter Pruning

AU - Zhang, Shuo

PY - 2023/6/13

Y1 - 2023/6/13

N2 - The rapid development of convolutional neural networks (CNNs) in computer vision tasks has inspired researchers to apply their potential to embedded or mobile devices. However, it typically requires a large amountof computation and memory footprint, limiting their deployment in thoseresource-limited systems. Therefore, how to compress complex networkswhile maintaining competitive performance has become the focus of attention in recent years. On the subject of network compression, filterpruning methods that achieve structured compact model by finding andremoving redundant filters, have attracted widespread attention. Inspiredby previous dedicated works, this thesis focuses on the way to obtain thecompact model while maximizing the retention of the original model performance. In particular, aiming at the limitations of choosing filters onthe existing popular pruning methods, several novel filter pruning strategies are proposed to find and remove redundant filters more accuratelyto reduce the performance loss of the model caused by pruning. For instance, the filter pruning method with an attention mechanism (Chapter3), data-dependent filter pruning guided by LSTM (Chapter 4), and filterpruning with uniqueness mechanism in the frequency domain (Chapter5).This thesis first addresses the filter pruning issue from a global perspective. To this end, we propose a new scheme, termed Pruning Filter withan Attention Mechanism (PFAM). That is, by establishing the dependency/relationship between filters at each layer, we explore the long-termdependence between filters via attention module in order to choose the tobe-pruned filters. Unlike prior approaches that identify the to-be-prunedfilters simply based on their intrinsic properties, the less correlated filtersare first pruned after the pruning step in the current training epoch andthen reconstructed and updated during the subsequent training epoch.Thus, the compressed network model can be achieved without the requirement for a pre-trained model since input data can be manipulated with the maximum information maintained when the original training strategyis executed.Next, it is noticed that most existing pruning algorithms seek to prune thefilter layer by layer. Specifically, they guide filter pruning at each layer bysetting a global pruning rate, which indicates that each convolutional layeris treated equally without regard to its depth and width. In this situation,we argue that the convolutional layers in the network also have varyingdegrees of significance. Besides, we propose that selecting the appropriatelayers for pruning is more reasonable since it can result in more complexityreduction with less performance loss by keeping and removing more filtersin those critical and nonsignificant layers, respectively. In order to do this,long short-term memory (LSTM) is employed to learn the hierarchicalproperties of a network and to generalize a global network pruning scheme.On top of that, we present a data-dependent soft pruning strategy namedSqueeze-Excitation-Pruning (SEP), which does not physically prune anyfilters but removes specific kernels involved in calculating forward andbackward propagations based on the pruning scheme. Doing so can furtherdecrease the model’s performance decline while achieving a deep modelcompression.Lastly, we transfer the concept of relationship from the filter level to thefeature map level because the feature maps can reflect the comprehensive information of both input data and filters. Hence, we propose FilterPruning with Uniqueness Mechanism in the Frequency Domain (FPUM)to serve as a guideline for the filter pruning strategy by generating thecorrelation between feature maps. Specifically, we first transfer featuresto the frequency domain by Discrete Cosine Transform (DCT). Then, foreach feature map, we compute a uniqueness score, which measures itsprobability of being replaced by others. Doing so allows us to prune thefilters corresponding to the low-uniqueness maps without significant performance degradation. In addition, our strategy is more resistant to noisethan spatial methods, further enhancing the network’s compactness whilemaintaining performance, as the critical pruning clues are more concentrated following DCT.

AB - The rapid development of convolutional neural networks (CNNs) in computer vision tasks has inspired researchers to apply their potential to embedded or mobile devices. However, it typically requires a large amountof computation and memory footprint, limiting their deployment in thoseresource-limited systems. Therefore, how to compress complex networkswhile maintaining competitive performance has become the focus of attention in recent years. On the subject of network compression, filterpruning methods that achieve structured compact model by finding andremoving redundant filters, have attracted widespread attention. Inspiredby previous dedicated works, this thesis focuses on the way to obtain thecompact model while maximizing the retention of the original model performance. In particular, aiming at the limitations of choosing filters onthe existing popular pruning methods, several novel filter pruning strategies are proposed to find and remove redundant filters more accuratelyto reduce the performance loss of the model caused by pruning. For instance, the filter pruning method with an attention mechanism (Chapter3), data-dependent filter pruning guided by LSTM (Chapter 4), and filterpruning with uniqueness mechanism in the frequency domain (Chapter5).This thesis first addresses the filter pruning issue from a global perspective. To this end, we propose a new scheme, termed Pruning Filter withan Attention Mechanism (PFAM). That is, by establishing the dependency/relationship between filters at each layer, we explore the long-termdependence between filters via attention module in order to choose the tobe-pruned filters. Unlike prior approaches that identify the to-be-prunedfilters simply based on their intrinsic properties, the less correlated filtersare first pruned after the pruning step in the current training epoch andthen reconstructed and updated during the subsequent training epoch.Thus, the compressed network model can be achieved without the requirement for a pre-trained model since input data can be manipulated with the maximum information maintained when the original training strategyis executed.Next, it is noticed that most existing pruning algorithms seek to prune thefilter layer by layer. Specifically, they guide filter pruning at each layer bysetting a global pruning rate, which indicates that each convolutional layeris treated equally without regard to its depth and width. In this situation,we argue that the convolutional layers in the network also have varyingdegrees of significance. Besides, we propose that selecting the appropriatelayers for pruning is more reasonable since it can result in more complexityreduction with less performance loss by keeping and removing more filtersin those critical and nonsignificant layers, respectively. In order to do this,long short-term memory (LSTM) is employed to learn the hierarchicalproperties of a network and to generalize a global network pruning scheme.On top of that, we present a data-dependent soft pruning strategy namedSqueeze-Excitation-Pruning (SEP), which does not physically prune anyfilters but removes specific kernels involved in calculating forward andbackward propagations based on the pruning scheme. Doing so can furtherdecrease the model’s performance decline while achieving a deep modelcompression.Lastly, we transfer the concept of relationship from the filter level to thefeature map level because the feature maps can reflect the comprehensive information of both input data and filters. Hence, we propose FilterPruning with Uniqueness Mechanism in the Frequency Domain (FPUM)to serve as a guideline for the filter pruning strategy by generating thecorrelation between feature maps. Specifically, we first transfer featuresto the frequency domain by Discrete Cosine Transform (DCT). Then, foreach feature map, we compute a uniqueness score, which measures itsprobability of being replaced by others. Doing so allows us to prune thefilters corresponding to the low-uniqueness maps without significant performance degradation. In addition, our strategy is more resistant to noisethan spatial methods, further enhancing the network’s compactness whilemaintaining performance, as the critical pruning clues are more concentrated following DCT.

U2 - 10.17635/lancaster/thesis/2009

DO - 10.17635/lancaster/thesis/2009

M3 - Doctoral Thesis

PB - Lancaster University

ER -

Research

Electronic data

Text available via DOI: