Home > Research > Publications & Outputs > Smart multi-task scheduling for OpenCL programs...
View graph of relations

Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. / Wen, Yuan; Wang, Zheng; O'Boyle, Michael.
21st Annual IEEE International Conference on High Performance Computing (HiPC 2014). IEEE, 2014.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Wen, Y, Wang, Z & O'Boyle, M 2014, Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. in 21st Annual IEEE International Conference on High Performance Computing (HiPC 2014). IEEE, 21st annual IEEE International Conference on High Performance Computing (HiPC 2014), India, 17/12/14. https://doi.org/10.1109/HiPC.2014.7116910

APA

Wen, Y., Wang, Z., & O'Boyle, M. (2014). Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In 21st Annual IEEE International Conference on High Performance Computing (HiPC 2014) IEEE. https://doi.org/10.1109/HiPC.2014.7116910

Vancouver

Wen Y, Wang Z, O'Boyle M. Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. In 21st Annual IEEE International Conference on High Performance Computing (HiPC 2014). IEEE. 2014 doi: 10.1109/HiPC.2014.7116910

Author

Wen, Yuan ; Wang, Zheng ; O'Boyle, Michael. / Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms. 21st Annual IEEE International Conference on High Performance Computing (HiPC 2014). IEEE, 2014.

Bibtex

@inproceedings{c4ce9ce754bf431fba9c3ce457e6f28d,
title = "Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms",
abstract = "Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms for high performance computing. Such platforms are usually programmed using OpenCL which provides program portability by allowing the same program to execute on different types of device. As such systems become more mainstream, they will move from application dedicated devices to platforms that need to support multiple concurrent user applications. Here there is a need to determine when and where to map different applications so as to best utilize the available heterogeneous hardware resources. In this paper, we present an efficient OpenCL task scheduling scheme which schedules multiple kernels from multiple programs on CPU/GPU heterogeneous platforms. It does this by determining at runtime which kernels are likely to best utilize a device. We show that speedup is a good scheduling priority function and develop a novel model that predicts a kernel's speedup based on its static code structure. Our scheduler uses this prediction and runtime input data size to prioritize and schedule tasks. This technique is applied to a large set of concurrent OpenCL kernels. We evaluated our approach for system throughput and average turn-around time against competitive techniques on two different platforms: a Core i7/Nvidia GTX590 and a Core i7/AMD Tahiti 7970 platforms. For system throughput, we achieve, on average, a 1.21x and 1.25x improvement over the best competitors on the NVIDIA and AMD platforms respectively. Our approach reduces the turnaround time, on average, by at least 1.5x and 1.2x on the NVIDIA and AMD platforms respectively, when compared to alternative approaches.",
author = "Yuan Wen and Zheng Wang and Michael O'Boyle",
year = "2014",
doi = "10.1109/HiPC.2014.7116910",
language = "English",
isbn = "9781479959754",
booktitle = "21st Annual IEEE International Conference on High Performance Computing (HiPC 2014)",
publisher = "IEEE",
note = "21st annual IEEE International Conference on High Performance Computing (HiPC 2014) ; Conference date: 17-12-2014 Through 20-12-2014",

}

RIS

TY - GEN

T1 - Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms

AU - Wen, Yuan

AU - Wang, Zheng

AU - O'Boyle, Michael

PY - 2014

Y1 - 2014

N2 - Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms for high performance computing. Such platforms are usually programmed using OpenCL which provides program portability by allowing the same program to execute on different types of device. As such systems become more mainstream, they will move from application dedicated devices to platforms that need to support multiple concurrent user applications. Here there is a need to determine when and where to map different applications so as to best utilize the available heterogeneous hardware resources. In this paper, we present an efficient OpenCL task scheduling scheme which schedules multiple kernels from multiple programs on CPU/GPU heterogeneous platforms. It does this by determining at runtime which kernels are likely to best utilize a device. We show that speedup is a good scheduling priority function and develop a novel model that predicts a kernel's speedup based on its static code structure. Our scheduler uses this prediction and runtime input data size to prioritize and schedule tasks. This technique is applied to a large set of concurrent OpenCL kernels. We evaluated our approach for system throughput and average turn-around time against competitive techniques on two different platforms: a Core i7/Nvidia GTX590 and a Core i7/AMD Tahiti 7970 platforms. For system throughput, we achieve, on average, a 1.21x and 1.25x improvement over the best competitors on the NVIDIA and AMD platforms respectively. Our approach reduces the turnaround time, on average, by at least 1.5x and 1.2x on the NVIDIA and AMD platforms respectively, when compared to alternative approaches.

AB - Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms for high performance computing. Such platforms are usually programmed using OpenCL which provides program portability by allowing the same program to execute on different types of device. As such systems become more mainstream, they will move from application dedicated devices to platforms that need to support multiple concurrent user applications. Here there is a need to determine when and where to map different applications so as to best utilize the available heterogeneous hardware resources. In this paper, we present an efficient OpenCL task scheduling scheme which schedules multiple kernels from multiple programs on CPU/GPU heterogeneous platforms. It does this by determining at runtime which kernels are likely to best utilize a device. We show that speedup is a good scheduling priority function and develop a novel model that predicts a kernel's speedup based on its static code structure. Our scheduler uses this prediction and runtime input data size to prioritize and schedule tasks. This technique is applied to a large set of concurrent OpenCL kernels. We evaluated our approach for system throughput and average turn-around time against competitive techniques on two different platforms: a Core i7/Nvidia GTX590 and a Core i7/AMD Tahiti 7970 platforms. For system throughput, we achieve, on average, a 1.21x and 1.25x improvement over the best competitors on the NVIDIA and AMD platforms respectively. Our approach reduces the turnaround time, on average, by at least 1.5x and 1.2x on the NVIDIA and AMD platforms respectively, when compared to alternative approaches.

U2 - 10.1109/HiPC.2014.7116910

DO - 10.1109/HiPC.2014.7116910

M3 - Conference contribution/Paper

SN - 9781479959754

BT - 21st Annual IEEE International Conference on High Performance Computing (HiPC 2014)

PB - IEEE

T2 - 21st annual IEEE International Conference on High Performance Computing (HiPC 2014)

Y2 - 17 December 2014 through 20 December 2014

ER -