Home > Research > Publications & Outputs > Tails in the cloud

Electronic data

  • SUPE-D-20-00042.R1

    Rights statement: The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-020-03241-x

    Accepted author manuscript, 646 KB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

Text available via DOI:

View graph of relations

Tails in the cloud: a survey and taxonomy of straggler management within large‑scale cloud data centres

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Tails in the cloud: a survey and taxonomy of straggler management within large‑scale cloud data centres. / Singh Gill, Sukhpal ; Ouyang, Xue; Garraghan, Peter.
In: Journal of Supercomputing, Vol. 76, 01.12.2020, p. 10050–10089.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Vancouver

Singh Gill S, Ouyang X, Garraghan P. Tails in the cloud: a survey and taxonomy of straggler management within large‑scale cloud data centres. Journal of Supercomputing. 2020 Dec 1;76:10050–10089. Epub 2020 Mar 12. doi: 10.1007/s11227-020-03241-x

Author

Singh Gill, Sukhpal ; Ouyang, Xue ; Garraghan, Peter. / Tails in the cloud : a survey and taxonomy of straggler management within large‑scale cloud data centres. In: Journal of Supercomputing. 2020 ; Vol. 76. pp. 10050–10089.

Bibtex

@article{d7b52525af014ec4a08bd88a8e656765,
title = "Tails in the cloud: a survey and taxonomy of straggler management within large‑scale cloud data centres",
abstract = "Cloud computing systems are splitting compute- and data-intensive jobs into smaller tasks to execute them in a parallel manner using clusters to improve execution time. However, such systems at increasing scale are exposed to stragglers, whereby abnormally slow running tasks executing within a job substantially affect job performance completion. Such stragglers are a direct threat towards attaining fast execution of data-intensive jobs within cloud computing. Researchers have proposed an assortment of different mechanisms, frameworks, and management techniques to detect and mitigate stragglers both proactively and reactively. In this paper, we present a comprehensive review of straggler management techniques within large-scale cloud data centres. We provide a detailed taxonomy of straggler causes, as well as proposed management and mitigation techniques based on straggler characteristics and properties. From this systematic review, we outline several outstanding challenges and potential directions of possible future work for straggler research.",
keywords = "Computing, Stragglers, Cloud computing, Straggler management, Distributed systems, Cloud data centres",
author = "{Singh Gill}, Sukhpal and Xue Ouyang and Peter Garraghan",
note = "The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-020-03241-x",
year = "2020",
month = dec,
day = "1",
doi = "10.1007/s11227-020-03241-x",
language = "English",
volume = "76",
pages = "10050–10089",
journal = "Journal of Supercomputing",
issn = "0920-8542",
publisher = "Springer Netherlands",

}

RIS

TY - JOUR

T1 - Tails in the cloud

T2 - a survey and taxonomy of straggler management within large‑scale cloud data centres

AU - Singh Gill, Sukhpal

AU - Ouyang, Xue

AU - Garraghan, Peter

N1 - The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-020-03241-x

PY - 2020/12/1

Y1 - 2020/12/1

N2 - Cloud computing systems are splitting compute- and data-intensive jobs into smaller tasks to execute them in a parallel manner using clusters to improve execution time. However, such systems at increasing scale are exposed to stragglers, whereby abnormally slow running tasks executing within a job substantially affect job performance completion. Such stragglers are a direct threat towards attaining fast execution of data-intensive jobs within cloud computing. Researchers have proposed an assortment of different mechanisms, frameworks, and management techniques to detect and mitigate stragglers both proactively and reactively. In this paper, we present a comprehensive review of straggler management techniques within large-scale cloud data centres. We provide a detailed taxonomy of straggler causes, as well as proposed management and mitigation techniques based on straggler characteristics and properties. From this systematic review, we outline several outstanding challenges and potential directions of possible future work for straggler research.

AB - Cloud computing systems are splitting compute- and data-intensive jobs into smaller tasks to execute them in a parallel manner using clusters to improve execution time. However, such systems at increasing scale are exposed to stragglers, whereby abnormally slow running tasks executing within a job substantially affect job performance completion. Such stragglers are a direct threat towards attaining fast execution of data-intensive jobs within cloud computing. Researchers have proposed an assortment of different mechanisms, frameworks, and management techniques to detect and mitigate stragglers both proactively and reactively. In this paper, we present a comprehensive review of straggler management techniques within large-scale cloud data centres. We provide a detailed taxonomy of straggler causes, as well as proposed management and mitigation techniques based on straggler characteristics and properties. From this systematic review, we outline several outstanding challenges and potential directions of possible future work for straggler research.

KW - Computing

KW - Stragglers

KW - Cloud computing

KW - Straggler management

KW - Distributed systems

KW - Cloud data centres

U2 - 10.1007/s11227-020-03241-x

DO - 10.1007/s11227-020-03241-x

M3 - Journal article

VL - 76

SP - 10050

EP - 10089

JO - Journal of Supercomputing

JF - Journal of Supercomputing

SN - 0920-8542

ER -