Rights statement: The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-020-03241-x
Accepted author manuscript, 646 KB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License
Final published version
Research output: Contribution to Journal/Magazine › Journal article › peer-review
Research output: Contribution to Journal/Magazine › Journal article › peer-review
}
TY - JOUR
T1 - Tails in the cloud
T2 - a survey and taxonomy of straggler management within large‑scale cloud data centres
AU - Singh Gill, Sukhpal
AU - Ouyang, Xue
AU - Garraghan, Peter
N1 - The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-020-03241-x
PY - 2020/12/1
Y1 - 2020/12/1
N2 - Cloud computing systems are splitting compute- and data-intensive jobs into smaller tasks to execute them in a parallel manner using clusters to improve execution time. However, such systems at increasing scale are exposed to stragglers, whereby abnormally slow running tasks executing within a job substantially affect job performance completion. Such stragglers are a direct threat towards attaining fast execution of data-intensive jobs within cloud computing. Researchers have proposed an assortment of different mechanisms, frameworks, and management techniques to detect and mitigate stragglers both proactively and reactively. In this paper, we present a comprehensive review of straggler management techniques within large-scale cloud data centres. We provide a detailed taxonomy of straggler causes, as well as proposed management and mitigation techniques based on straggler characteristics and properties. From this systematic review, we outline several outstanding challenges and potential directions of possible future work for straggler research.
AB - Cloud computing systems are splitting compute- and data-intensive jobs into smaller tasks to execute them in a parallel manner using clusters to improve execution time. However, such systems at increasing scale are exposed to stragglers, whereby abnormally slow running tasks executing within a job substantially affect job performance completion. Such stragglers are a direct threat towards attaining fast execution of data-intensive jobs within cloud computing. Researchers have proposed an assortment of different mechanisms, frameworks, and management techniques to detect and mitigate stragglers both proactively and reactively. In this paper, we present a comprehensive review of straggler management techniques within large-scale cloud data centres. We provide a detailed taxonomy of straggler causes, as well as proposed management and mitigation techniques based on straggler characteristics and properties. From this systematic review, we outline several outstanding challenges and potential directions of possible future work for straggler research.
KW - Computing
KW - Stragglers
KW - Cloud computing
KW - Straggler management
KW - Distributed systems
KW - Cloud data centres
U2 - 10.1007/s11227-020-03241-x
DO - 10.1007/s11227-020-03241-x
M3 - Journal article
VL - 76
SP - 10050
EP - 10089
JO - Journal of Supercomputing
JF - Journal of Supercomputing
SN - 0920-8542
ER -