Home > Research > Publications & Outputs > Using underutilized CPU resources to enhance it...

Links

Text available via DOI:

View graph of relations

Using underutilized CPU resources to enhance its reliability

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Using underutilized CPU resources to enhance its reliability. / Timor, A.; Mendelson, A.; Birk, Y. et al.
In: IEEE Transactions on Dependable and Secure Computing, Vol. 7, No. 1, 01.01.2010, p. 94-109.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Timor, A, Mendelson, A, Birk, Y & Suri, N 2010, 'Using underutilized CPU resources to enhance its reliability', IEEE Transactions on Dependable and Secure Computing, vol. 7, no. 1, pp. 94-109. https://doi.org/10.1109/TDSC.2008.31

APA

Timor, A., Mendelson, A., Birk, Y., & Suri, N. (2010). Using underutilized CPU resources to enhance its reliability. IEEE Transactions on Dependable and Secure Computing, 7(1), 94-109. https://doi.org/10.1109/TDSC.2008.31

Vancouver

Timor A, Mendelson A, Birk Y, Suri N. Using underutilized CPU resources to enhance its reliability. IEEE Transactions on Dependable and Secure Computing. 2010 Jan 1;7(1):94-109. Epub 2008 May 16. doi: 10.1109/TDSC.2008.31

Author

Timor, A. ; Mendelson, A. ; Birk, Y. et al. / Using underutilized CPU resources to enhance its reliability. In: IEEE Transactions on Dependable and Secure Computing. 2010 ; Vol. 7, No. 1. pp. 94-109.

Bibtex

@article{cd4340c256d1454999f09f2d278262ba,
title = "Using underutilized CPU resources to enhance its reliability",
abstract = "Soft errors (or Transient faults) are temporary faults that arise in a circuit due to a variety of internal noise and external sources such as cosmic particle hits. Though soft errors still occur infrequently, they are rapidly becoming a major impediment to processor reliability. This is due primarily to processor scaling characteristics. In the past, systems designed to tolerate such faults utilized costly customized solutions, entailing the use of replicated hardware components to detect and recover from microprocessor faults. As the feature size keeps shrinking and with the proliferation of multiprocessor on die in all segments of computer-based systems, the capability to detect and recover from faults is also desired for commodity hardware. For such systems, however, performance and power constitute the main drivers, so the traditional solutions prove inadequate and new approaches are required. We introduce two independent and complementary microarchitecture-level techniques: Double Execution and Double Decoding. Both exploit the typically low average processor resource utilization of modern processors to enhance processor reliability. Double Execution protects the Out-Of-Order part of the CPU by executing each instruction twice. Double Decoding uses a second, low-performance low-power instruction decoder to detect soft errors in the decoder logic. These simple-to-implement techniques are shown to improve the processor's reliability with relatively low performance, power, and hardware overheads. Finally, the resulting excessive reliability can even be traded back for performance by increasing clock rate and/or reducing voltage, thereby improving upon single execution approaches. {\textcopyright} 2006 IEEE.",
keywords = "Double execution, Fault tolerance, Microarchitecture, Soft errors, Superscalar, Transient faults, Clock rate, Commodity hardware, Computer-based system, Cosmic particles, CPU resources, Customized solutions, External sources, Feature sizes, Hardware components, Hardware overheads, Internal noise, Low Power, Micro architectures, Microprocessor faults, Modern processors, New approaches, Out of order, Processor reliability, Processor resources, Soft error, Temporary fault, Computer hardware, Cosmology, Decoding, Error correction, Fault tolerant computer systems, Microprocessor chips, Quality assurance",
author = "A. Timor and A. Mendelson and Y. Birk and Neeraj Suri",
year = "2010",
month = jan,
day = "1",
doi = "10.1109/TDSC.2008.31",
language = "English",
volume = "7",
pages = "94--109",
journal = "IEEE Transactions on Dependable and Secure Computing",
issn = "1545-5971",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "1",

}

RIS

TY - JOUR

T1 - Using underutilized CPU resources to enhance its reliability

AU - Timor, A.

AU - Mendelson, A.

AU - Birk, Y.

AU - Suri, Neeraj

PY - 2010/1/1

Y1 - 2010/1/1

N2 - Soft errors (or Transient faults) are temporary faults that arise in a circuit due to a variety of internal noise and external sources such as cosmic particle hits. Though soft errors still occur infrequently, they are rapidly becoming a major impediment to processor reliability. This is due primarily to processor scaling characteristics. In the past, systems designed to tolerate such faults utilized costly customized solutions, entailing the use of replicated hardware components to detect and recover from microprocessor faults. As the feature size keeps shrinking and with the proliferation of multiprocessor on die in all segments of computer-based systems, the capability to detect and recover from faults is also desired for commodity hardware. For such systems, however, performance and power constitute the main drivers, so the traditional solutions prove inadequate and new approaches are required. We introduce two independent and complementary microarchitecture-level techniques: Double Execution and Double Decoding. Both exploit the typically low average processor resource utilization of modern processors to enhance processor reliability. Double Execution protects the Out-Of-Order part of the CPU by executing each instruction twice. Double Decoding uses a second, low-performance low-power instruction decoder to detect soft errors in the decoder logic. These simple-to-implement techniques are shown to improve the processor's reliability with relatively low performance, power, and hardware overheads. Finally, the resulting excessive reliability can even be traded back for performance by increasing clock rate and/or reducing voltage, thereby improving upon single execution approaches. © 2006 IEEE.

AB - Soft errors (or Transient faults) are temporary faults that arise in a circuit due to a variety of internal noise and external sources such as cosmic particle hits. Though soft errors still occur infrequently, they are rapidly becoming a major impediment to processor reliability. This is due primarily to processor scaling characteristics. In the past, systems designed to tolerate such faults utilized costly customized solutions, entailing the use of replicated hardware components to detect and recover from microprocessor faults. As the feature size keeps shrinking and with the proliferation of multiprocessor on die in all segments of computer-based systems, the capability to detect and recover from faults is also desired for commodity hardware. For such systems, however, performance and power constitute the main drivers, so the traditional solutions prove inadequate and new approaches are required. We introduce two independent and complementary microarchitecture-level techniques: Double Execution and Double Decoding. Both exploit the typically low average processor resource utilization of modern processors to enhance processor reliability. Double Execution protects the Out-Of-Order part of the CPU by executing each instruction twice. Double Decoding uses a second, low-performance low-power instruction decoder to detect soft errors in the decoder logic. These simple-to-implement techniques are shown to improve the processor's reliability with relatively low performance, power, and hardware overheads. Finally, the resulting excessive reliability can even be traded back for performance by increasing clock rate and/or reducing voltage, thereby improving upon single execution approaches. © 2006 IEEE.

KW - Double execution

KW - Fault tolerance

KW - Microarchitecture

KW - Soft errors

KW - Superscalar

KW - Transient faults

KW - Clock rate

KW - Commodity hardware

KW - Computer-based system

KW - Cosmic particles

KW - CPU resources

KW - Customized solutions

KW - External sources

KW - Feature sizes

KW - Hardware components

KW - Hardware overheads

KW - Internal noise

KW - Low Power

KW - Micro architectures

KW - Microprocessor faults

KW - Modern processors

KW - New approaches

KW - Out of order

KW - Processor reliability

KW - Processor resources

KW - Soft error

KW - Temporary fault

KW - Computer hardware

KW - Cosmology

KW - Decoding

KW - Error correction

KW - Fault tolerant computer systems

KW - Microprocessor chips

KW - Quality assurance

U2 - 10.1109/TDSC.2008.31

DO - 10.1109/TDSC.2008.31

M3 - Journal article

VL - 7

SP - 94

EP - 109

JO - IEEE Transactions on Dependable and Secure Computing

JF - IEEE Transactions on Dependable and Secure Computing

SN - 1545-5971

IS - 1

ER -