Home > Research > Publications & Outputs > Tolerating transient late-timing faults in clou...

Electronic data

  • Submitted ISORC - Real-time Stream Processing

    Rights statement: © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.

    Accepted author manuscript, 402 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

Text available via DOI:

View graph of relations

Tolerating transient late-timing faults in cloud-based real-time stream processing

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Tolerating transient late-timing faults in cloud-based real-time stream processing. / Garraghan, Peter; Perks, Stuart; Ouyang, Xue et al.
2016 IEEE 19th International Symposium on Real-Time Distributed Computing (ISORC) . IEEE, 2016. p. 108-115.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Garraghan, P, Perks, S, Ouyang, X, McKee, D & Moreno, IS 2016, Tolerating transient late-timing faults in cloud-based real-time stream processing. in 2016 IEEE 19th International Symposium on Real-Time Distributed Computing (ISORC) . IEEE, pp. 108-115. https://doi.org/10.1109/ISORC.2016.24

APA

Garraghan, P., Perks, S., Ouyang, X., McKee, D., & Moreno, I. S. (2016). Tolerating transient late-timing faults in cloud-based real-time stream processing. In 2016 IEEE 19th International Symposium on Real-Time Distributed Computing (ISORC) (pp. 108-115). IEEE. https://doi.org/10.1109/ISORC.2016.24

Vancouver

Garraghan P, Perks S, Ouyang X, McKee D, Moreno IS. Tolerating transient late-timing faults in cloud-based real-time stream processing. In 2016 IEEE 19th International Symposium on Real-Time Distributed Computing (ISORC) . IEEE. 2016. p. 108-115 doi: 10.1109/ISORC.2016.24

Author

Garraghan, Peter ; Perks, Stuart ; Ouyang, Xue et al. / Tolerating transient late-timing faults in cloud-based real-time stream processing. 2016 IEEE 19th International Symposium on Real-Time Distributed Computing (ISORC) . IEEE, 2016. pp. 108-115

Bibtex

@inproceedings{6913d10185204e06b07383686fbb2f7d,
title = "Tolerating transient late-timing faults in cloud-based real-time stream processing",
abstract = "Real-time stream processing is a frequently deployed application within Cloud datacenters that is required to provision high levels of performance and reliability. Numerous fault-tolerant approaches have been proposed to effectively achieve this objective in the presence of crash failures. However, such systems struggle with transient late-timing faults - a fault classification challenging to effectively tolerate - that manifests increasingly within large-scale distributed systems. Such faults represent a significant threat towards minimizing soft real-time execution of streaming applications in the presence of failures. This work proposes a fault-tolerant approach for QoS-aware data prediction to tolerate transient late-timing faults. The approach is capable of determining the most effective data prediction algorithm for imposed QoS constraints on a failed stream processor at run-time. We integrated our approach into Apache Storm with experiment results showing its ability to minimize stream processor end-to-end execution time by 61% compared to other fault-tolerant approaches. The approach incurs 12% additional CPU utilization while reducing network usage by 44%.",
keywords = "Prediction algorithms, Real-time systems, Fault tolerance, Fault tolerant systems, Transient analysis, Quality of service, Predictive models",
author = "Peter Garraghan and Stuart Perks and Xue Ouyang and David McKee and Moreno, {Ismael Solis}",
note = "{\textcopyright} 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.",
year = "2016",
month = jul,
day = "21",
doi = "10.1109/ISORC.2016.24",
language = "English",
pages = "108--115",
booktitle = "2016 IEEE 19th International Symposium on Real-Time Distributed Computing (ISORC)",
publisher = "IEEE",

}

RIS

TY - GEN

T1 - Tolerating transient late-timing faults in cloud-based real-time stream processing

AU - Garraghan, Peter

AU - Perks, Stuart

AU - Ouyang, Xue

AU - McKee, David

AU - Moreno, Ismael Solis

N1 - © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.

PY - 2016/7/21

Y1 - 2016/7/21

N2 - Real-time stream processing is a frequently deployed application within Cloud datacenters that is required to provision high levels of performance and reliability. Numerous fault-tolerant approaches have been proposed to effectively achieve this objective in the presence of crash failures. However, such systems struggle with transient late-timing faults - a fault classification challenging to effectively tolerate - that manifests increasingly within large-scale distributed systems. Such faults represent a significant threat towards minimizing soft real-time execution of streaming applications in the presence of failures. This work proposes a fault-tolerant approach for QoS-aware data prediction to tolerate transient late-timing faults. The approach is capable of determining the most effective data prediction algorithm for imposed QoS constraints on a failed stream processor at run-time. We integrated our approach into Apache Storm with experiment results showing its ability to minimize stream processor end-to-end execution time by 61% compared to other fault-tolerant approaches. The approach incurs 12% additional CPU utilization while reducing network usage by 44%.

AB - Real-time stream processing is a frequently deployed application within Cloud datacenters that is required to provision high levels of performance and reliability. Numerous fault-tolerant approaches have been proposed to effectively achieve this objective in the presence of crash failures. However, such systems struggle with transient late-timing faults - a fault classification challenging to effectively tolerate - that manifests increasingly within large-scale distributed systems. Such faults represent a significant threat towards minimizing soft real-time execution of streaming applications in the presence of failures. This work proposes a fault-tolerant approach for QoS-aware data prediction to tolerate transient late-timing faults. The approach is capable of determining the most effective data prediction algorithm for imposed QoS constraints on a failed stream processor at run-time. We integrated our approach into Apache Storm with experiment results showing its ability to minimize stream processor end-to-end execution time by 61% compared to other fault-tolerant approaches. The approach incurs 12% additional CPU utilization while reducing network usage by 44%.

KW - Prediction algorithms

KW - Real-time systems

KW - Fault tolerance

KW - Fault tolerant systems

KW - Transient analysis

KW - Quality of service

KW - Predictive models

U2 - 10.1109/ISORC.2016.24

DO - 10.1109/ISORC.2016.24

M3 - Conference contribution/Paper

SP - 108

EP - 115

BT - 2016 IEEE 19th International Symposium on Real-Time Distributed Computing (ISORC)

PB - IEEE

ER -