Home > Research > Publications & Outputs > Horus

Electronic data

  • TPDS_Horus_ging_fung_yeung

    Accepted author manuscript, 4.98 MB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

Text available via DOI:

View graph of relations

Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems. / Yeung, Ging-Fung; Borowiec, Damian; Yang, Renyu et al.
In: IEEE Transactions on Parallel and Distributed Systems, Vol. 33, No. 1, 21015055, 31.01.2022, p. 88-100.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

APA

Vancouver

Yeung G-F, Borowiec D, Yang R, Friday A, Harper RHR, Garraghan P. Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems. IEEE Transactions on Parallel and Distributed Systems. 2022 Jan 31;33(1):88-100. 21015055. Epub 2021 May 11. doi: 10.1109/TPDS.2021.3079202

Author

Yeung, Ging-Fung ; Borowiec, Damian ; Yang, Renyu et al. / Horus : Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems. In: IEEE Transactions on Parallel and Distributed Systems. 2022 ; Vol. 33, No. 1. pp. 88-100.

Bibtex

@article{73c3a50be6824a859b434871b9b583fa,
title = "Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems",
abstract = "To accelerate the training of Deep Learning (DL) models, clusters of machines equipped with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of-the-art resource managers are needed to increase GPU utilization and maximize throughput. While co-locating DL jobs on the same GPU has been shown to be effective, this can incur interference causing slowdown. In this paper we propose Horus: an interference-aware and prediction-based resource manager for DL systems. Horus proactively predicts GPU utilization of heterogeneous DL jobs extrapolated from the DL model{\textquoteright}s computation graph features, removing the need for online profiling and isolated reserved GPUs. Through micro-benchmarks and job co-location combinations across heterogeneous GPU hardware, we identify GPU utilization as a general proxy metric to determine good placement decisions, in contrast to current approaches which reserve isolated GPUs to perform online profiling and directly measure GPU utilization for each unique submitted job. Our approach promotes high resource utilization and makespan reduction; via real-world experimentation and large-scale trace driven simulation, we demonstrate that Horus outperforms other DL resource managers by up to 61.5% for GPU resource utilization, 23.7–30.7% for makespan reduction and 68.3% in job wait time reduction.",
keywords = "distributed computing, Deep Learning, interference, cloud computing, GPU Scheduling",
author = "Ging-Fung Yeung and Damian Borowiec and Renyu Yang and Adrian Friday and R.H.R. Harper and Peter Garraghan",
year = "2022",
month = jan,
day = "31",
doi = "10.1109/TPDS.2021.3079202",
language = "English",
volume = "33",
pages = "88--100",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "1",

}

RIS

TY - JOUR

T1 - Horus

T2 - Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems

AU - Yeung, Ging-Fung

AU - Borowiec, Damian

AU - Yang, Renyu

AU - Friday, Adrian

AU - Harper, R.H.R.

AU - Garraghan, Peter

PY - 2022/1/31

Y1 - 2022/1/31

N2 - To accelerate the training of Deep Learning (DL) models, clusters of machines equipped with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of-the-art resource managers are needed to increase GPU utilization and maximize throughput. While co-locating DL jobs on the same GPU has been shown to be effective, this can incur interference causing slowdown. In this paper we propose Horus: an interference-aware and prediction-based resource manager for DL systems. Horus proactively predicts GPU utilization of heterogeneous DL jobs extrapolated from the DL model’s computation graph features, removing the need for online profiling and isolated reserved GPUs. Through micro-benchmarks and job co-location combinations across heterogeneous GPU hardware, we identify GPU utilization as a general proxy metric to determine good placement decisions, in contrast to current approaches which reserve isolated GPUs to perform online profiling and directly measure GPU utilization for each unique submitted job. Our approach promotes high resource utilization and makespan reduction; via real-world experimentation and large-scale trace driven simulation, we demonstrate that Horus outperforms other DL resource managers by up to 61.5% for GPU resource utilization, 23.7–30.7% for makespan reduction and 68.3% in job wait time reduction.

AB - To accelerate the training of Deep Learning (DL) models, clusters of machines equipped with hardware accelerators such as GPUs are leveraged to reduce execution time. State-of-the-art resource managers are needed to increase GPU utilization and maximize throughput. While co-locating DL jobs on the same GPU has been shown to be effective, this can incur interference causing slowdown. In this paper we propose Horus: an interference-aware and prediction-based resource manager for DL systems. Horus proactively predicts GPU utilization of heterogeneous DL jobs extrapolated from the DL model’s computation graph features, removing the need for online profiling and isolated reserved GPUs. Through micro-benchmarks and job co-location combinations across heterogeneous GPU hardware, we identify GPU utilization as a general proxy metric to determine good placement decisions, in contrast to current approaches which reserve isolated GPUs to perform online profiling and directly measure GPU utilization for each unique submitted job. Our approach promotes high resource utilization and makespan reduction; via real-world experimentation and large-scale trace driven simulation, we demonstrate that Horus outperforms other DL resource managers by up to 61.5% for GPU resource utilization, 23.7–30.7% for makespan reduction and 68.3% in job wait time reduction.

KW - distributed computing

KW - Deep Learning

KW - interference

KW - cloud computing

KW - GPU Scheduling

U2 - 10.1109/TPDS.2021.3079202

DO - 10.1109/TPDS.2021.3079202

M3 - Journal article

VL - 33

SP - 88

EP - 100

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 1

M1 - 21015055

ER -