Troodon: A Machine-Learning based Load-Balancing Application Scheduler for CPU-GPU System

Vice-Chancellor's Office

Text available via DOI:

https://doi.org/10.1016/j.jpdc.2019.05.015
Final published version

Keywords

Heterogeneous system, Scheduling, Device suitability, Load-balancing, Machine Learning

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Troodon: A Machine-Learning based Load-Balancing Application Scheduler for CPU-GPU System. / Khalid, Yasir Noman; Aleem, Muhammad; Ahmed, Usman et al.
In: Journal of Parallel and Distributed Computing, Vol. 132, 31.10.2019, p. 79-94.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Khalid, YN, Aleem, M, Ahmed, U, Islam, MA & Iqbal, MA 2019, 'Troodon: A Machine-Learning based Load-Balancing Application Scheduler for CPU-GPU System', Journal of Parallel and Distributed Computing, vol. 132, pp. 79-94. https://doi.org/10.1016/j.jpdc.2019.05.015

APA

Khalid, Y. N., Aleem, M., Ahmed, U., Islam, M. A., & Iqbal, M. A. (2019). Troodon: A Machine-Learning based Load-Balancing Application Scheduler for CPU-GPU System. Journal of Parallel and Distributed Computing, 132, 79-94. https://doi.org/10.1016/j.jpdc.2019.05.015

Vancouver

Khalid YN, Aleem M, Ahmed U, Islam MA, Iqbal MA. Troodon: A Machine-Learning based Load-Balancing Application Scheduler for CPU-GPU System. Journal of Parallel and Distributed Computing. 2019 Oct 31;132:79-94. Epub 2019 Jun 14. doi: 10.1016/j.jpdc.2019.05.015

Author

Khalid, Yasir Noman ; Aleem, Muhammad ; Ahmed, Usman et al. / Troodon: A Machine-Learning based Load-Balancing Application Scheduler for CPU-GPU System. In: Journal of Parallel and Distributed Computing. 2019 ; Vol. 132. pp. 79-94.

Bibtex

@article{27dacbd79c244ad390477e77d6cc3777,

title = "Troodon: A Machine-Learning based Load-Balancing Application Scheduler for CPU-GPU System",

abstract = "Heterogeneous computing machines consisting of a CPU and one or more GPUs are increasingly being used today because of their higher performance-cost ratio and lower energy consumption. To program such heterogeneous systems, OpenCL has become an industry standard due to the portability across various computing architectures. To exploit the computing capabilities of heterogeneous systems, application developers are porting their cluster and Cloud applications using OpenCL. With the increasing number of such applications, the use of shared accelerating computing devices (such as CPUs and GPUs) should be managed using an efficient load-balancing scheduling heuristic capable of reducingexecution time, increasing throughput with high device utilization. Mostly, the OpenCL applications are suited (execute faster) on a specific computing device (CPU or GPU) and with varying data-sizes the speedup obtained by an application on the suitable device varies too. Applications{\textquoteright} mapping to computing devices without considering device suitability and obtainable speedup on a suitable device leads to sub-optimal execution time, lower throughput and load imbalance. Therefore, an application scheduler should consider both the device-suitability and speedup variation for scheduling decisions leading to a reduction in execution time and an increase in throughput. In this paper, we present a novel load-balancing scheduling heuristic named as Troodon that considers machinelearning based device-suitability model that classify OpenCL applications into either CPU suitable or GPU suitable. Moreover, a speedup predictor that predicts the amount of speedup that jobs will obtain when executed on a suitable device is also part of the Troodon. Troodon incorporates the E-OSched scheduling mechanism to map jobs on CPU and GPUs in a load balanced way. This results in reduced applications execution time, increased system throughput, and improved device utilization. We evaluate the proposed scheduler using a large number of data-parallel applications and compared with several other state-of-the-art scheduling heuristics. The experimental evaluation has demonstrated that the proposed scheduler outperformed the existing heuristics and reduced the application execution time up to 38% with increased system throughput and device utilization.",

keywords = "Heterogeneous system, Scheduling, Device suitability, Load-balancing, Machine Learning",

author = "Khalid, {Yasir Noman} and Muhammad Aleem and Usman Ahmed and Islam, {Muhammad Arshad} and Iqbal, {Muhammad Azhar}",

year = "2019",

month = oct,

day = "31",

doi = "10.1016/j.jpdc.2019.05.015",

language = "English",

volume = "132",

pages = "79--94",

journal = "Journal of Parallel and Distributed Computing",

issn = "0743-7315",

publisher = "Academic Press Inc.",

}

RIS

TY - JOUR

T1 - Troodon: A Machine-Learning based Load-Balancing Application Scheduler for CPU-GPU System

AU - Khalid, Yasir Noman

AU - Aleem, Muhammad

AU - Ahmed, Usman

AU - Islam, Muhammad Arshad

AU - Iqbal, Muhammad Azhar

PY - 2019/10/31

Y1 - 2019/10/31

N2 - Heterogeneous computing machines consisting of a CPU and one or more GPUs are increasingly being used today because of their higher performance-cost ratio and lower energy consumption. To program such heterogeneous systems, OpenCL has become an industry standard due to the portability across various computing architectures. To exploit the computing capabilities of heterogeneous systems, application developers are porting their cluster and Cloud applications using OpenCL. With the increasing number of such applications, the use of shared accelerating computing devices (such as CPUs and GPUs) should be managed using an efficient load-balancing scheduling heuristic capable of reducingexecution time, increasing throughput with high device utilization. Mostly, the OpenCL applications are suited (execute faster) on a specific computing device (CPU or GPU) and with varying data-sizes the speedup obtained by an application on the suitable device varies too. Applications’ mapping to computing devices without considering device suitability and obtainable speedup on a suitable device leads to sub-optimal execution time, lower throughput and load imbalance. Therefore, an application scheduler should consider both the device-suitability and speedup variation for scheduling decisions leading to a reduction in execution time and an increase in throughput. In this paper, we present a novel load-balancing scheduling heuristic named as Troodon that considers machinelearning based device-suitability model that classify OpenCL applications into either CPU suitable or GPU suitable. Moreover, a speedup predictor that predicts the amount of speedup that jobs will obtain when executed on a suitable device is also part of the Troodon. Troodon incorporates the E-OSched scheduling mechanism to map jobs on CPU and GPUs in a load balanced way. This results in reduced applications execution time, increased system throughput, and improved device utilization. We evaluate the proposed scheduler using a large number of data-parallel applications and compared with several other state-of-the-art scheduling heuristics. The experimental evaluation has demonstrated that the proposed scheduler outperformed the existing heuristics and reduced the application execution time up to 38% with increased system throughput and device utilization.

AB - Heterogeneous computing machines consisting of a CPU and one or more GPUs are increasingly being used today because of their higher performance-cost ratio and lower energy consumption. To program such heterogeneous systems, OpenCL has become an industry standard due to the portability across various computing architectures. To exploit the computing capabilities of heterogeneous systems, application developers are porting their cluster and Cloud applications using OpenCL. With the increasing number of such applications, the use of shared accelerating computing devices (such as CPUs and GPUs) should be managed using an efficient load-balancing scheduling heuristic capable of reducingexecution time, increasing throughput with high device utilization. Mostly, the OpenCL applications are suited (execute faster) on a specific computing device (CPU or GPU) and with varying data-sizes the speedup obtained by an application on the suitable device varies too. Applications’ mapping to computing devices without considering device suitability and obtainable speedup on a suitable device leads to sub-optimal execution time, lower throughput and load imbalance. Therefore, an application scheduler should consider both the device-suitability and speedup variation for scheduling decisions leading to a reduction in execution time and an increase in throughput. In this paper, we present a novel load-balancing scheduling heuristic named as Troodon that considers machinelearning based device-suitability model that classify OpenCL applications into either CPU suitable or GPU suitable. Moreover, a speedup predictor that predicts the amount of speedup that jobs will obtain when executed on a suitable device is also part of the Troodon. Troodon incorporates the E-OSched scheduling mechanism to map jobs on CPU and GPUs in a load balanced way. This results in reduced applications execution time, increased system throughput, and improved device utilization. We evaluate the proposed scheduler using a large number of data-parallel applications and compared with several other state-of-the-art scheduling heuristics. The experimental evaluation has demonstrated that the proposed scheduler outperformed the existing heuristics and reduced the application execution time up to 38% with increased system throughput and device utilization.

KW - Heterogeneous system

KW - Scheduling

KW - Device suitability

KW - Load-balancing

KW - Machine Learning

U2 - 10.1016/j.jpdc.2019.05.015

DO - 10.1016/j.jpdc.2019.05.015

M3 - Journal article

VL - 132

SP - 79

EP - 94

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

ER -

Research

Links

Text available via DOI:

Keywords