Holistic energy and failure aware workload scheduling in Cloud datacenters

Computing and Communications

Associated organisational units

Electronic data

FGCS - Energy-aware Failure-Aware Scheduling - Accepted
Rights statement: This is the author’s version of a work that was accepted for publication in Future Generation Computer Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Future Generation Computer Systems, 78, 3, 2017 DOI: 10.1016/j.future.2017.07.044
Accepted author manuscript, 1.69 MB, PDF document
Available under license: CC BY-NC-ND: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Text available via DOI:

https://doi.org/10.1016/j.future.2017.07.044
Final published version

Keywords

Energy efficiency, Thermal management, Reliability, Failures, Workload scheduling, Cloud computing

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Holistic energy and failure aware workload scheduling in Cloud datacenters. / Li, Xiang; Jiang, Xiaohong; Garraghan, Peter et al.
In: Future Generation Computer Systems, Vol. 78, No. 3, 01.2018, p. 887-900.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Li, X, Jiang, X, Garraghan, P & Wu, Z 2018, 'Holistic energy and failure aware workload scheduling in Cloud datacenters', Future Generation Computer Systems, vol. 78, no. 3, pp. 887-900. https://doi.org/10.1016/j.future.2017.07.044

APA

Li, X., Jiang, X., Garraghan, P., & Wu, Z. (2018). Holistic energy and failure aware workload scheduling in Cloud datacenters. Future Generation Computer Systems, 78(3), 887-900. https://doi.org/10.1016/j.future.2017.07.044

Vancouver

Li X, Jiang X, Garraghan P, Wu Z. Holistic energy and failure aware workload scheduling in Cloud datacenters. Future Generation Computer Systems. 2018 Jan;78(3):887-900. Epub 2017 Jul 22. doi: 10.1016/j.future.2017.07.044

Author

Li, Xiang ; Jiang, Xiaohong ; Garraghan, Peter et al. / Holistic energy and failure aware workload scheduling in Cloud datacenters. In: Future Generation Computer Systems. 2018 ; Vol. 78, No. 3. pp. 887-900.

Bibtex

@article{11978dccda0b44eb923a762c0130bdba,

title = "Holistic energy and failure aware workload scheduling in Cloud datacenters",

abstract = "The global uptake of Cloud computing has attracted increased interest within both academia and industry resulting in the formation of large-scale and complex distributed systems. This has led to increased failure occurrence within computing systems that induce substantial negative impact upon system performance and task reliability perceived by users. Such systems also consume vast quantities of power, resulting in significant operational costs perceived by providers. Virtualization – a commonly deployed technology within Cloud datacenters – can enable flexible scheduling of virtual machines to maximize system reliability and energy-efficiency. However, existing work address these two objectives separately, providing limited understanding towards studying the explicit trade-offs towards dependable and energy-efficient compute infrastructure. In this paper, we propose two failure-aware energy-efficient scheduling algorithms that exploit the holistic operational characteristics of the Cloud datacenter comprising the cooling unit, computing infrastructure and server failures. By comprehensively modeling the power and failure profiles of a Cloud datacenter, we propose workload scheduling algorithms Ella-W and Ella-B, capable of reducing cooling and compute energy while minimizing the impact of system failures. A novel and overall metric is proposed that combines energy efficiency and reliability to specify the performance of various algorithms. We evaluate our algorithms against Random, MaxUtil, TASA, MTTE and OBFIT under various system conditions of failure prediction accuracy and workload intensity. Evaluation results demonstrate that Ella-W can reduce energy usage by 29.5% and improve task completion rate by 3.6%, while Ella-B reduces energy usage by 32.7% with no degradation to task completion rate.",

keywords = "Energy efficiency, Thermal management, Reliability, Failures, Workload scheduling, Cloud computing",

author = "Xiang Li and Xiaohong Jiang and Peter Garraghan and Zhaohui Wu",

note = "This is the author{\textquoteright}s version of a work that was accepted for publication in Future Generation Computer Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Future Generation Computer Systems, 78, 3, 2017 DOI: 10.1016/j.future.2017.07.044",

year = "2018",

month = jan,

doi = "10.1016/j.future.2017.07.044",

language = "English",

volume = "78",

pages = "887--900",

journal = "Future Generation Computer Systems",

issn = "0167-739X",

publisher = "Elsevier",

number = "3",

}

RIS

TY - JOUR

T1 - Holistic energy and failure aware workload scheduling in Cloud datacenters

AU - Li, Xiang

AU - Jiang, Xiaohong

AU - Garraghan, Peter

AU - Wu, Zhaohui

N1 - This is the author’s version of a work that was accepted for publication in Future Generation Computer Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Future Generation Computer Systems, 78, 3, 2017 DOI: 10.1016/j.future.2017.07.044

PY - 2018/1

Y1 - 2018/1

N2 - The global uptake of Cloud computing has attracted increased interest within both academia and industry resulting in the formation of large-scale and complex distributed systems. This has led to increased failure occurrence within computing systems that induce substantial negative impact upon system performance and task reliability perceived by users. Such systems also consume vast quantities of power, resulting in significant operational costs perceived by providers. Virtualization – a commonly deployed technology within Cloud datacenters – can enable flexible scheduling of virtual machines to maximize system reliability and energy-efficiency. However, existing work address these two objectives separately, providing limited understanding towards studying the explicit trade-offs towards dependable and energy-efficient compute infrastructure. In this paper, we propose two failure-aware energy-efficient scheduling algorithms that exploit the holistic operational characteristics of the Cloud datacenter comprising the cooling unit, computing infrastructure and server failures. By comprehensively modeling the power and failure profiles of a Cloud datacenter, we propose workload scheduling algorithms Ella-W and Ella-B, capable of reducing cooling and compute energy while minimizing the impact of system failures. A novel and overall metric is proposed that combines energy efficiency and reliability to specify the performance of various algorithms. We evaluate our algorithms against Random, MaxUtil, TASA, MTTE and OBFIT under various system conditions of failure prediction accuracy and workload intensity. Evaluation results demonstrate that Ella-W can reduce energy usage by 29.5% and improve task completion rate by 3.6%, while Ella-B reduces energy usage by 32.7% with no degradation to task completion rate.

AB - The global uptake of Cloud computing has attracted increased interest within both academia and industry resulting in the formation of large-scale and complex distributed systems. This has led to increased failure occurrence within computing systems that induce substantial negative impact upon system performance and task reliability perceived by users. Such systems also consume vast quantities of power, resulting in significant operational costs perceived by providers. Virtualization – a commonly deployed technology within Cloud datacenters – can enable flexible scheduling of virtual machines to maximize system reliability and energy-efficiency. However, existing work address these two objectives separately, providing limited understanding towards studying the explicit trade-offs towards dependable and energy-efficient compute infrastructure. In this paper, we propose two failure-aware energy-efficient scheduling algorithms that exploit the holistic operational characteristics of the Cloud datacenter comprising the cooling unit, computing infrastructure and server failures. By comprehensively modeling the power and failure profiles of a Cloud datacenter, we propose workload scheduling algorithms Ella-W and Ella-B, capable of reducing cooling and compute energy while minimizing the impact of system failures. A novel and overall metric is proposed that combines energy efficiency and reliability to specify the performance of various algorithms. We evaluate our algorithms against Random, MaxUtil, TASA, MTTE and OBFIT under various system conditions of failure prediction accuracy and workload intensity. Evaluation results demonstrate that Ella-W can reduce energy usage by 29.5% and improve task completion rate by 3.6%, while Ella-B reduces energy usage by 32.7% with no degradation to task completion rate.

KW - Energy efficiency

KW - Thermal management

KW - Reliability

KW - Failures

KW - Workload scheduling

KW - Cloud computing

U2 - 10.1016/j.future.2017.07.044

DO - 10.1016/j.future.2017.07.044

M3 - Journal article

VL - 78

SP - 887

EP - 900

JO - Future Generation Computer Systems

JF - Future Generation Computer Systems

SN - 0167-739X

IS - 3

ER -

Research

Associated organisational units

Electronic data

Links

Text available via DOI:

Keywords