Home > Research > Publications & Outputs > Delve into Neural Activations

Electronic data

  • Activation_TAI_Final

    Rights statement: ©2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

    Accepted author manuscript, 6.92 MB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

Text available via DOI:

View graph of relations

Delve into Neural Activations: Towards Understanding Dying Neurons

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Delve into Neural Activations: Towards Understanding Dying Neurons. / Jiang, Ziping; Wang, Yunpeng; Li, Chang-Tsun et al.
In: IEEE Transactions on Artificial Intelligence, Vol. 4, No. 4, 4, 01.08.2023, p. 959-971.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Jiang, Z, Wang, Y, Li, C-T, Angelov, P & Jiang, R 2023, 'Delve into Neural Activations: Towards Understanding Dying Neurons', IEEE Transactions on Artificial Intelligence, vol. 4, no. 4, 4, pp. 959-971. https://doi.org/10.1109/TAI.2022.3180272

APA

Jiang, Z., Wang, Y., Li, C-T., Angelov, P., & Jiang, R. (2023). Delve into Neural Activations: Towards Understanding Dying Neurons. IEEE Transactions on Artificial Intelligence, 4(4), 959-971. Article 4. https://doi.org/10.1109/TAI.2022.3180272

Vancouver

Jiang Z, Wang Y, Li C-T, Angelov P, Jiang R. Delve into Neural Activations: Towards Understanding Dying Neurons. IEEE Transactions on Artificial Intelligence. 2023 Aug 1;4(4):959-971. 4. Epub 2022 Jun 9. doi: 10.1109/TAI.2022.3180272

Author

Jiang, Ziping ; Wang, Yunpeng ; Li, Chang-Tsun et al. / Delve into Neural Activations : Towards Understanding Dying Neurons. In: IEEE Transactions on Artificial Intelligence. 2023 ; Vol. 4, No. 4. pp. 959-971.

Bibtex

@article{3ac9e45ba3b749afb78887a7808c93cd,
title = "Delve into Neural Activations: Towards Understanding Dying Neurons",
abstract = "Theoretically, a deep neuron network with nonlinear activation is able to approximate any function, while empirically the performance of the model with different activations varies widely. In this work, we investigate the expressivity of the network from an activation perspective. In particular, we introduce a generalized activation region/pattern to describe the functional relationship of the model with an arbitrary activation function and illustrate its fundamental properties. We then propose a metric named pattern similarity to evaluate the practical expressivity of neuron networks regarding datasets based on the neuron level reaction toward the input. We find an undocumented dying neuron issue that the postactivation value of most neurons remain in the same region for data with different labels, implying that the expressivity of the network with certain activations is greatly constrained. For instance, around 80% of postactivation values of a well-trained Sigmoid net or Tanh net are clustered in the same region given any test sample. This means most of the neurons fail to provide any useful information in distinguishing the data with different labels, suggesting that the practical expressivity of those networks is far below the theoretical. By evaluating our metrics and the test accuracy of the model, we show that the seriousness of the dying neuron issue is highly related to the model performance. At last, we also discussed the cause of the dying neuron issue, providing an explanation of the model performance gap caused by the choice of activation.",
keywords = "Artificial Intelligence, Artificial neural networks, Multi-layer neural network",
author = "Ziping Jiang and Yunpeng Wang and Chang-Tsun Li and Plamen Angelov and Richard Jiang",
note = "{\textcopyright}2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.",
year = "2023",
month = aug,
day = "1",
doi = "10.1109/TAI.2022.3180272",
language = "English",
volume = "4",
pages = "959--971",
journal = "IEEE Transactions on Artificial Intelligence",
issn = "2691-4581",
publisher = "IEEE",
number = "4",

}

RIS

TY - JOUR

T1 - Delve into Neural Activations

T2 - Towards Understanding Dying Neurons

AU - Jiang, Ziping

AU - Wang, Yunpeng

AU - Li, Chang-Tsun

AU - Angelov, Plamen

AU - Jiang, Richard

N1 - ©2022 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

PY - 2023/8/1

Y1 - 2023/8/1

N2 - Theoretically, a deep neuron network with nonlinear activation is able to approximate any function, while empirically the performance of the model with different activations varies widely. In this work, we investigate the expressivity of the network from an activation perspective. In particular, we introduce a generalized activation region/pattern to describe the functional relationship of the model with an arbitrary activation function and illustrate its fundamental properties. We then propose a metric named pattern similarity to evaluate the practical expressivity of neuron networks regarding datasets based on the neuron level reaction toward the input. We find an undocumented dying neuron issue that the postactivation value of most neurons remain in the same region for data with different labels, implying that the expressivity of the network with certain activations is greatly constrained. For instance, around 80% of postactivation values of a well-trained Sigmoid net or Tanh net are clustered in the same region given any test sample. This means most of the neurons fail to provide any useful information in distinguishing the data with different labels, suggesting that the practical expressivity of those networks is far below the theoretical. By evaluating our metrics and the test accuracy of the model, we show that the seriousness of the dying neuron issue is highly related to the model performance. At last, we also discussed the cause of the dying neuron issue, providing an explanation of the model performance gap caused by the choice of activation.

AB - Theoretically, a deep neuron network with nonlinear activation is able to approximate any function, while empirically the performance of the model with different activations varies widely. In this work, we investigate the expressivity of the network from an activation perspective. In particular, we introduce a generalized activation region/pattern to describe the functional relationship of the model with an arbitrary activation function and illustrate its fundamental properties. We then propose a metric named pattern similarity to evaluate the practical expressivity of neuron networks regarding datasets based on the neuron level reaction toward the input. We find an undocumented dying neuron issue that the postactivation value of most neurons remain in the same region for data with different labels, implying that the expressivity of the network with certain activations is greatly constrained. For instance, around 80% of postactivation values of a well-trained Sigmoid net or Tanh net are clustered in the same region given any test sample. This means most of the neurons fail to provide any useful information in distinguishing the data with different labels, suggesting that the practical expressivity of those networks is far below the theoretical. By evaluating our metrics and the test accuracy of the model, we show that the seriousness of the dying neuron issue is highly related to the model performance. At last, we also discussed the cause of the dying neuron issue, providing an explanation of the model performance gap caused by the choice of activation.

KW - Artificial Intelligence

KW - Artificial neural networks

KW - Multi-layer neural network

U2 - 10.1109/TAI.2022.3180272

DO - 10.1109/TAI.2022.3180272

M3 - Journal article

VL - 4

SP - 959

EP - 971

JO - IEEE Transactions on Artificial Intelligence

JF - IEEE Transactions on Artificial Intelligence

SN - 2691-4581

IS - 4

M1 - 4

ER -