Home > Research > Publications & Outputs > PRISM

Electronic data

  • SUBMITTED 2019 - PRISM An Experiment Framework for Straggler Analytics in Containerized Clusters

    Rights statement: © ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in WOC '19: Proceedings of the 5th International Workshop on Container Technologies and Container Clouds 2019 https://dl.acm.org/doi/abs/10.1145/3366615.3368353

    Accepted author manuscript, 297 KB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

Text available via DOI:

View graph of relations

PRISM: An Experiment Framework for Straggler Analytics in Containerized Clusters

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

PRISM: An Experiment Framework for Straggler Analytics in Containerized Clusters. / Lindsay, Dominic; Gill, Sukhpal; Garraghan, Peter.
WoC 2019 Fifth International Workshop on Container Technologies and Container Clouds. ACM, 2019. p. 13-18.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Lindsay, D, Gill, S & Garraghan, P 2019, PRISM: An Experiment Framework for Straggler Analytics in Containerized Clusters. in WoC 2019 Fifth International Workshop on Container Technologies and Container Clouds. ACM, pp. 13-18. https://doi.org/10.1145/3366615.3368353

APA

Lindsay, D., Gill, S., & Garraghan, P. (2019). PRISM: An Experiment Framework for Straggler Analytics in Containerized Clusters. In WoC 2019 Fifth International Workshop on Container Technologies and Container Clouds (pp. 13-18). ACM. https://doi.org/10.1145/3366615.3368353

Vancouver

Lindsay D, Gill S, Garraghan P. PRISM: An Experiment Framework for Straggler Analytics in Containerized Clusters. In WoC 2019 Fifth International Workshop on Container Technologies and Container Clouds. ACM. 2019. p. 13-18 doi: 10.1145/3366615.3368353

Author

Lindsay, Dominic ; Gill, Sukhpal ; Garraghan, Peter. / PRISM : An Experiment Framework for Straggler Analytics in Containerized Clusters. WoC 2019 Fifth International Workshop on Container Technologies and Container Clouds. ACM, 2019. pp. 13-18

Bibtex

@inproceedings{dc5d81c680f74916a39de57ec5435d53,
title = "PRISM: An Experiment Framework for Straggler Analytics in Containerized Clusters",
abstract = "Containerized clusters of machines at scale that provision Cloud services are encountering substantive difficulties with stragglers -- whereby a small subset of task execution negatively degrades system performance. Stragglers are an unsolved challenge due to a wide variety of root-causes and stochastic behavior. While there have been efforts to mitigate their effects, few works have attempted to empirically ascertain how system operational scenarios precisely influence straggler occurrence and severity. This challenge is further compounded with the difficulties of conducting experiments within real-world containerized clusters. System maintenance and experiment design are often error-prone and time-consuming processes, and a large portion of tools created for workload submission and straggler injection are bespoke to specific clusters, limiting experiment reproducibility. In this paper we propose PRISM, a framework that automates containerized cluster setup, experiment design, and experiment execution. Our framework is capable of deployment, configuration, execution, performance trace transformation and aggregation of containerized application frameworks, enabling scripted execution of diverse workloads and cluster configurations. The framework reduces time required for cluster setup and experiment execution from hours to minutes. We use PRISM to conduct automated experimentation of system operational conditions and identify straggler manifestation is affected by resource contention, input data size and scheduler architecture limitations.",
keywords = "Straggler, Containers, Datacenters, Clusters",
author = "Dominic Lindsay and Sukhpal Gill and Peter Garraghan",
note = "{\textcopyright} ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in WOC '19: Proceedings of the 5th International Workshop on Container Technologies and Container Clouds 2019 https://dl.acm.org/doi/abs/10.1145/3366615.3368353",
year = "2019",
month = dec,
day = "1",
doi = "10.1145/3366615.3368353",
language = "English",
isbn = "9781450370332",
pages = "13--18",
booktitle = "WoC 2019 Fifth International Workshop on Container Technologies and Container Clouds",
publisher = "ACM",

}

RIS

TY - GEN

T1 - PRISM

T2 - An Experiment Framework for Straggler Analytics in Containerized Clusters

AU - Lindsay, Dominic

AU - Gill, Sukhpal

AU - Garraghan, Peter

N1 - © ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in WOC '19: Proceedings of the 5th International Workshop on Container Technologies and Container Clouds 2019 https://dl.acm.org/doi/abs/10.1145/3366615.3368353

PY - 2019/12/1

Y1 - 2019/12/1

N2 - Containerized clusters of machines at scale that provision Cloud services are encountering substantive difficulties with stragglers -- whereby a small subset of task execution negatively degrades system performance. Stragglers are an unsolved challenge due to a wide variety of root-causes and stochastic behavior. While there have been efforts to mitigate their effects, few works have attempted to empirically ascertain how system operational scenarios precisely influence straggler occurrence and severity. This challenge is further compounded with the difficulties of conducting experiments within real-world containerized clusters. System maintenance and experiment design are often error-prone and time-consuming processes, and a large portion of tools created for workload submission and straggler injection are bespoke to specific clusters, limiting experiment reproducibility. In this paper we propose PRISM, a framework that automates containerized cluster setup, experiment design, and experiment execution. Our framework is capable of deployment, configuration, execution, performance trace transformation and aggregation of containerized application frameworks, enabling scripted execution of diverse workloads and cluster configurations. The framework reduces time required for cluster setup and experiment execution from hours to minutes. We use PRISM to conduct automated experimentation of system operational conditions and identify straggler manifestation is affected by resource contention, input data size and scheduler architecture limitations.

AB - Containerized clusters of machines at scale that provision Cloud services are encountering substantive difficulties with stragglers -- whereby a small subset of task execution negatively degrades system performance. Stragglers are an unsolved challenge due to a wide variety of root-causes and stochastic behavior. While there have been efforts to mitigate their effects, few works have attempted to empirically ascertain how system operational scenarios precisely influence straggler occurrence and severity. This challenge is further compounded with the difficulties of conducting experiments within real-world containerized clusters. System maintenance and experiment design are often error-prone and time-consuming processes, and a large portion of tools created for workload submission and straggler injection are bespoke to specific clusters, limiting experiment reproducibility. In this paper we propose PRISM, a framework that automates containerized cluster setup, experiment design, and experiment execution. Our framework is capable of deployment, configuration, execution, performance trace transformation and aggregation of containerized application frameworks, enabling scripted execution of diverse workloads and cluster configurations. The framework reduces time required for cluster setup and experiment execution from hours to minutes. We use PRISM to conduct automated experimentation of system operational conditions and identify straggler manifestation is affected by resource contention, input data size and scheduler architecture limitations.

KW - Straggler

KW - Containers

KW - Datacenters

KW - Clusters

U2 - 10.1145/3366615.3368353

DO - 10.1145/3366615.3368353

M3 - Conference contribution/Paper

SN - 9781450370332

SP - 13

EP - 18

BT - WoC 2019 Fifth International Workshop on Container Technologies and Container Clouds

PB - ACM

ER -