Accepted author manuscript, 882 KB, PDF document
Available under license: CC BY: Creative Commons Attribution 4.0 International License
Final published version
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - Horus
T2 - An Interference-aware Resource Manager for Deep Learning Systems
AU - Yeung, Gingfung
AU - Borowiec, Damian
AU - Yang, Renyu
AU - Friday, Adrian
AU - Harper, R.H.R.
AU - Garraghan, Peter
PY - 2020/9/29
Y1 - 2020/9/29
N2 - Deep Learning (DL) models are deployed as jobs within machines containing GPUs. These DL systems - ranging from a singular GPU device to machine clusters - require state-of-the-art resource management to increase resource utilization and job throughput. While it has been identified that co-location - multiple jobs co-located within the same GPU - is an effective means to achieve this, such co-location incurs performance interference that directly debilitates DL training and inference performance. Existing approaches to mitigate interference require resource intensive and time consuming kernel profiling ill-suited for runtime scheduling decisions. Current DL system resource management are not designed to deal with these problems. This paper proposes Horus, an interference-aware resource manager for DL systems. Instead of leveraging expensive kernel-profiling, our approach estimates job resource utilization and co-location patterns to determine effective DL job placement to minimize likelihood of interference, as well as improve system resource utilization and makespan. Our analysis shows that interference cause up to 3.2x DL job slowdown. We integrated our approach within the Kubernetes resource manager, and conduct experiments in a DL cluster by training 2,500 DL jobs using 13 different models types. Results demonstrate that Horus is able to outperform other DL resource managers by up to 61.5% for resource utilization and 33.6% for makespan.
AB - Deep Learning (DL) models are deployed as jobs within machines containing GPUs. These DL systems - ranging from a singular GPU device to machine clusters - require state-of-the-art resource management to increase resource utilization and job throughput. While it has been identified that co-location - multiple jobs co-located within the same GPU - is an effective means to achieve this, such co-location incurs performance interference that directly debilitates DL training and inference performance. Existing approaches to mitigate interference require resource intensive and time consuming kernel profiling ill-suited for runtime scheduling decisions. Current DL system resource management are not designed to deal with these problems. This paper proposes Horus, an interference-aware resource manager for DL systems. Instead of leveraging expensive kernel-profiling, our approach estimates job resource utilization and co-location patterns to determine effective DL job placement to minimize likelihood of interference, as well as improve system resource utilization and makespan. Our analysis shows that interference cause up to 3.2x DL job slowdown. We integrated our approach within the Kubernetes resource manager, and conduct experiments in a DL cluster by training 2,500 DL jobs using 13 different models types. Results demonstrate that Horus is able to outperform other DL resource managers by up to 61.5% for resource utilization and 33.6% for makespan.
KW - Machine Learning Systems
KW - Performance Interference
KW - Deep Learning
KW - GPU Scheduling
KW - Cluster resource management
U2 - 10.1007/978-3-030-60239-0_33
DO - 10.1007/978-3-030-60239-0_33
M3 - Conference contribution/Paper
SN - 9783030602383
T3 - Lecture Notes in Computer Science
SP - 492
EP - 508
BT - Algorithms and Architectures for Parallel Processing. ICA3PP 2020
A2 - Qiu, M.
PB - Springer
ER -