Portable mapping of data parallel programs to OpenCL for heterogeneous systems

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Portable mapping of data parallel programs to OpenCL for heterogeneous systems. / Grewe, D.; Wang, Zheng; O'Boyle, M.F.P.
Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on. IEEE, 2013. p. 1-10.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Grewe, D, Wang, Z & O'Boyle, MFP 2013, Portable mapping of data parallel programs to OpenCL for heterogeneous systems. in Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on. IEEE, pp. 1-10. https://doi.org/10.1109/CGO.2013.6494993

APA

Grewe, D., Wang, Z., & O'Boyle, M. F. P. (2013). Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on (pp. 1-10). IEEE. https://doi.org/10.1109/CGO.2013.6494993

Vancouver

Grewe D, Wang Z, O'Boyle MFP. Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on. IEEE. 2013. p. 1-10 doi: 10.1109/CGO.2013.6494993

Author

Grewe, D. ; Wang, Zheng ; O'Boyle, M.F.P. / Portable mapping of data parallel programs to OpenCL for heterogeneous systems. Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on. IEEE, 2013. pp. 1-10

Bibtex

@inproceedings{42149dcbd6b74780b625c3d8e36e7d85,

title = "Portable mapping of data parallel programs to OpenCL for heterogeneous systems",

abstract = "General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high level-language (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.",

keywords = "application program interfaces, graphics processing units, multiprocessing systems, parallel programming, program compilers, Core i7/AMD Radeon 7970, Core i7/NVIDIA GeForce GTX 580, NAS parallel benchmark suite, OpenCL code, OpenMP code, compiler, data parallel program mapping, data transformations, data-parallel OpenMP programs, general purpose GPU based systems, heterogeneous multicores, heterogeneous systems, high level language, predictive modeling, GPU, OpenCL, Machine, Learning Mapping",

author = "D. Grewe and Zheng Wang and M.F.P. O'Boyle",

year = "2013",

doi = "10.1109/CGO.2013.6494993",

language = "English",

isbn = "978-1-4673-5524-7",

pages = "1--10",

booktitle = "Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on",

publisher = "IEEE",

}

RIS

TY - GEN

T1 - Portable mapping of data parallel programs to OpenCL for heterogeneous systems

AU - Grewe, D.

AU - Wang, Zheng

AU - O'Boyle, M.F.P.

PY - 2013

Y1 - 2013

N2 - General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high level-language (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.

AB - General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high level-language (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.

KW - application program interfaces

KW - graphics processing units

KW - multiprocessing systems

KW - parallel programming

KW - program compilers

KW - Core i7/AMD Radeon 7970

KW - Core i7/NVIDIA GeForce GTX 580

KW - NAS parallel benchmark suite

KW - OpenCL code

KW - OpenMP code

KW - compiler

KW - data parallel program mapping

KW - data transformations

KW - data-parallel OpenMP programs

KW - general purpose GPU based systems

KW - heterogeneous multicores

KW - heterogeneous systems

KW - high level language

KW - predictive modeling

KW - GPU, OpenCL, Machine

KW - Learning Mapping

UR - http://www.scopus.com/inward/record.url?scp=84876937393&partnerID=8YFLogxK

U2 - 10.1109/CGO.2013.6494993

DO - 10.1109/CGO.2013.6494993

M3 - Conference contribution/Paper

SN - 978-1-4673-5524-7

SP - 1

EP - 10

BT - Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on

PB - IEEE

ER -

Research

Associated organisational unit

Links

Text available via DOI:

Keywords