Home > Research > Publications & Outputs > A 218 GOPS neural network accelerator based on ...

Links

Text available via DOI:

View graph of relations

A 218 GOPS neural network accelerator based on a novel cost-efficient surrogate gradient scheme for pattern classification

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

A 218 GOPS neural network accelerator based on a novel cost-efficient surrogate gradient scheme for pattern classification. / Siddique, Ali; Iqbal, Muhammad Azhar; Aleem, Muhammad et al.
In: Microprocessors and Microsystems, Vol. 99, 104831, 30.06.2023.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Vancouver

Siddique A, Iqbal MA, Aleem M, Islam MA. A 218 GOPS neural network accelerator based on a novel cost-efficient surrogate gradient scheme for pattern classification. Microprocessors and Microsystems. 2023 Jun 30;99:104831. Epub 2023 Apr 22. doi: 10.1016/j.micpro.2023.104831

Author

Bibtex

@article{ca32802bee784f86adbdf25ce6ab91dc,
title = "A 218 GOPS neural network accelerator based on a novel cost-efficient surrogate gradient scheme for pattern classification",
abstract = "The accuracy and hardware efficiency of a neural system depends critically on the choice of an activation function. Rectified linear unit (ReLU) is a contemporary activation function that yields high accuracy and allows the construction of efficient neural chips, but it results in a lot of dead neurons, especially at the output layer. This problem is more pronounced in case of multichannel, multiclass classification. This is due to the fact that ReLU cancels out negative values altogether, as a result of which the corresponding values cannot be successfully backpropagated. This phenomenon is referred to as the dying ReLU problem. In this article, we present a novel {\textquoteleft}surrogate gradient{\textquoteright} learning scheme in order to solve gradient vanishing and the dying ReLU problems. To the best of our knowledge, this is the first learning scheme that enables the use of ReLU for all network layers while solving the Dying ReLU problem. We also present a high-performance inference engine that uses ReLU-based actuators for all the network layers in order to achieve high hardware efficiency. The design is excellent for online learning as well, since the derivative of the activation is equal to a constant and can be implemented using low-complexity components. The proposed technique significantly outperforms various contemporary schemes for the CIFAR-10 dataset, and can successfully yield about 98.39% accuracy on MNIST dataset while using less than 159k synapses. Moreover, the proposed hardware implementation is able to perform about 218 giga operation per second (GOPS) while consuming only about 3.95 slice registers and 25.89 slice look-up tables per synapse on a low-end Virtex 6 FPGA. The system is able to operate at a clock frequency of 93.2 MHz.",
author = "Ali Siddique and Iqbal, {Muhammad Azhar} and Muhammad Aleem and Islam, {Muhammad Arshad}",
year = "2023",
month = jun,
day = "30",
doi = "10.1016/j.micpro.2023.104831",
language = "English",
volume = "99",
journal = "Microprocessors and Microsystems",
issn = "0141-9331",
publisher = "Elsevier",

}

RIS

TY - JOUR

T1 - A 218 GOPS neural network accelerator based on a novel cost-efficient surrogate gradient scheme for pattern classification

AU - Siddique, Ali

AU - Iqbal, Muhammad Azhar

AU - Aleem, Muhammad

AU - Islam, Muhammad Arshad

PY - 2023/6/30

Y1 - 2023/6/30

N2 - The accuracy and hardware efficiency of a neural system depends critically on the choice of an activation function. Rectified linear unit (ReLU) is a contemporary activation function that yields high accuracy and allows the construction of efficient neural chips, but it results in a lot of dead neurons, especially at the output layer. This problem is more pronounced in case of multichannel, multiclass classification. This is due to the fact that ReLU cancels out negative values altogether, as a result of which the corresponding values cannot be successfully backpropagated. This phenomenon is referred to as the dying ReLU problem. In this article, we present a novel ‘surrogate gradient’ learning scheme in order to solve gradient vanishing and the dying ReLU problems. To the best of our knowledge, this is the first learning scheme that enables the use of ReLU for all network layers while solving the Dying ReLU problem. We also present a high-performance inference engine that uses ReLU-based actuators for all the network layers in order to achieve high hardware efficiency. The design is excellent for online learning as well, since the derivative of the activation is equal to a constant and can be implemented using low-complexity components. The proposed technique significantly outperforms various contemporary schemes for the CIFAR-10 dataset, and can successfully yield about 98.39% accuracy on MNIST dataset while using less than 159k synapses. Moreover, the proposed hardware implementation is able to perform about 218 giga operation per second (GOPS) while consuming only about 3.95 slice registers and 25.89 slice look-up tables per synapse on a low-end Virtex 6 FPGA. The system is able to operate at a clock frequency of 93.2 MHz.

AB - The accuracy and hardware efficiency of a neural system depends critically on the choice of an activation function. Rectified linear unit (ReLU) is a contemporary activation function that yields high accuracy and allows the construction of efficient neural chips, but it results in a lot of dead neurons, especially at the output layer. This problem is more pronounced in case of multichannel, multiclass classification. This is due to the fact that ReLU cancels out negative values altogether, as a result of which the corresponding values cannot be successfully backpropagated. This phenomenon is referred to as the dying ReLU problem. In this article, we present a novel ‘surrogate gradient’ learning scheme in order to solve gradient vanishing and the dying ReLU problems. To the best of our knowledge, this is the first learning scheme that enables the use of ReLU for all network layers while solving the Dying ReLU problem. We also present a high-performance inference engine that uses ReLU-based actuators for all the network layers in order to achieve high hardware efficiency. The design is excellent for online learning as well, since the derivative of the activation is equal to a constant and can be implemented using low-complexity components. The proposed technique significantly outperforms various contemporary schemes for the CIFAR-10 dataset, and can successfully yield about 98.39% accuracy on MNIST dataset while using less than 159k synapses. Moreover, the proposed hardware implementation is able to perform about 218 giga operation per second (GOPS) while consuming only about 3.95 slice registers and 25.89 slice look-up tables per synapse on a low-end Virtex 6 FPGA. The system is able to operate at a clock frequency of 93.2 MHz.

U2 - 10.1016/j.micpro.2023.104831

DO - 10.1016/j.micpro.2023.104831

M3 - Journal article

VL - 99

JO - Microprocessors and Microsystems

JF - Microprocessors and Microsystems

SN - 0141-9331

M1 - 104831

ER -