Home > Research > Publications & Outputs > FusionCL

Links

Text available via DOI:

View graph of relations

FusionCL: A Machine-Learning Based Approach for OpenCL Kernel Fusion to Increase System Performance

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

FusionCL: A Machine-Learning Based Approach for OpenCL Kernel Fusion to Increase System Performance. / Khalid, Yasir Noman; Aleem, Muhammad; Ahmed, Usman et al.
In: Computing, Vol. 103, No. 10, 31.10.2021, p. 2171-2202.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Khalid, YN, Aleem, M, Ahmed, U, Prodan, R, Islam, MA & Iqbal, MA 2021, 'FusionCL: A Machine-Learning Based Approach for OpenCL Kernel Fusion to Increase System Performance', Computing, vol. 103, no. 10, pp. 2171-2202. https://doi.org/10.1007/s00607-021-00958-2

APA

Vancouver

Khalid YN, Aleem M, Ahmed U, Prodan R, Islam MA, Iqbal MA. FusionCL: A Machine-Learning Based Approach for OpenCL Kernel Fusion to Increase System Performance. Computing. 2021 Oct 31;103(10):2171-2202. Epub 2021 Jun 3. doi: 10.1007/s00607-021-00958-2

Author

Khalid, Yasir Noman ; Aleem, Muhammad ; Ahmed, Usman et al. / FusionCL : A Machine-Learning Based Approach for OpenCL Kernel Fusion to Increase System Performance. In: Computing. 2021 ; Vol. 103, No. 10. pp. 2171-2202.

Bibtex

@article{f3bf5153ceed46cebcaaa6774e36121b,
title = "FusionCL: A Machine-Learning Based Approach for OpenCL Kernel Fusion to Increase System Performance",
abstract = "Employing general-purpose graphics processing units (GPGPU) with the help of OpenCL has resulted in greatly reducing the execution time of data-parallel applications by taking advantage of the massive available parallelism. However, when a small data size application is executed on GPU there is a wastage of GPU resources as the application cannot fully utilize GPU compute-cores. There is no mechanism to share a GPU between two kernels due to the lack of operating system support on GPU. In this paper, we propose the provision of a GPU sharing mechanism between two kernels that will lead to increasing GPU occupancy, and as a result, reduce execution time of a job pool. However, if a pair of the kernel is competing for the same set of resources (i.e., both applications are compute-intensive or memory-intensive), kernel fusion may also result in a significant increase in execution time of fused kernels. Therefore, it is pertinent to select an optimal pair of kernels for fusion that will result in significant speedup over their serial execution. This research presents FusionCL, a machine learning-based GPU sharing mechanism between a pair of OpenCL kernels. FusionCL identifies each pair of kernels (from the job pool), which are suitable candidates for fusion using a machine learning-based fusion suitability classifier. Thereafter, from all the candidates, it selects a pair of candidate kernels that will produce maximum speedup after fusion over their serial execution using a fusion speedup predictor. The experimental evaluation shows that the proposed kernel fusion mechanism reduces execution time by 2.83× when compared to a baseline scheduling scheme. When compared to state-of-the-art, the reduction in execution time is up to 8%.",
keywords = "Scheduling, Kernel fusion, High-performance computing, Machine learning",
author = "Khalid, {Yasir Noman} and Muhammad Aleem and Usman Ahmed and Radu Prodan and Islam, {Muhammad Arshad} and Iqbal, {Muhammad Azhar}",
year = "2021",
month = oct,
day = "31",
doi = "10.1007/s00607-021-00958-2",
language = "English",
volume = "103",
pages = "2171--2202",
journal = "Computing",
issn = "0010-485X",
publisher = "Springer Wien",
number = "10",

}

RIS

TY - JOUR

T1 - FusionCL

T2 - A Machine-Learning Based Approach for OpenCL Kernel Fusion to Increase System Performance

AU - Khalid, Yasir Noman

AU - Aleem, Muhammad

AU - Ahmed, Usman

AU - Prodan, Radu

AU - Islam, Muhammad Arshad

AU - Iqbal, Muhammad Azhar

PY - 2021/10/31

Y1 - 2021/10/31

N2 - Employing general-purpose graphics processing units (GPGPU) with the help of OpenCL has resulted in greatly reducing the execution time of data-parallel applications by taking advantage of the massive available parallelism. However, when a small data size application is executed on GPU there is a wastage of GPU resources as the application cannot fully utilize GPU compute-cores. There is no mechanism to share a GPU between two kernels due to the lack of operating system support on GPU. In this paper, we propose the provision of a GPU sharing mechanism between two kernels that will lead to increasing GPU occupancy, and as a result, reduce execution time of a job pool. However, if a pair of the kernel is competing for the same set of resources (i.e., both applications are compute-intensive or memory-intensive), kernel fusion may also result in a significant increase in execution time of fused kernels. Therefore, it is pertinent to select an optimal pair of kernels for fusion that will result in significant speedup over their serial execution. This research presents FusionCL, a machine learning-based GPU sharing mechanism between a pair of OpenCL kernels. FusionCL identifies each pair of kernels (from the job pool), which are suitable candidates for fusion using a machine learning-based fusion suitability classifier. Thereafter, from all the candidates, it selects a pair of candidate kernels that will produce maximum speedup after fusion over their serial execution using a fusion speedup predictor. The experimental evaluation shows that the proposed kernel fusion mechanism reduces execution time by 2.83× when compared to a baseline scheduling scheme. When compared to state-of-the-art, the reduction in execution time is up to 8%.

AB - Employing general-purpose graphics processing units (GPGPU) with the help of OpenCL has resulted in greatly reducing the execution time of data-parallel applications by taking advantage of the massive available parallelism. However, when a small data size application is executed on GPU there is a wastage of GPU resources as the application cannot fully utilize GPU compute-cores. There is no mechanism to share a GPU between two kernels due to the lack of operating system support on GPU. In this paper, we propose the provision of a GPU sharing mechanism between two kernels that will lead to increasing GPU occupancy, and as a result, reduce execution time of a job pool. However, if a pair of the kernel is competing for the same set of resources (i.e., both applications are compute-intensive or memory-intensive), kernel fusion may also result in a significant increase in execution time of fused kernels. Therefore, it is pertinent to select an optimal pair of kernels for fusion that will result in significant speedup over their serial execution. This research presents FusionCL, a machine learning-based GPU sharing mechanism between a pair of OpenCL kernels. FusionCL identifies each pair of kernels (from the job pool), which are suitable candidates for fusion using a machine learning-based fusion suitability classifier. Thereafter, from all the candidates, it selects a pair of candidate kernels that will produce maximum speedup after fusion over their serial execution using a fusion speedup predictor. The experimental evaluation shows that the proposed kernel fusion mechanism reduces execution time by 2.83× when compared to a baseline scheduling scheme. When compared to state-of-the-art, the reduction in execution time is up to 8%.

KW - Scheduling

KW - Kernel fusion

KW - High-performance computing

KW - Machine learning

U2 - 10.1007/s00607-021-00958-2

DO - 10.1007/s00607-021-00958-2

M3 - Journal article

VL - 103

SP - 2171

EP - 2202

JO - Computing

JF - Computing

SN - 0010-485X

IS - 10

ER -