Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - Portable mapping of data parallel programs to OpenCL for heterogeneous systems
AU - Grewe, D.
AU - Wang, Zheng
AU - O'Boyle, M.F.P.
PY - 2013
Y1 - 2013
N2 - General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high level-language (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.
AB - General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high level-language (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.
KW - application program interfaces
KW - graphics processing units
KW - multiprocessing systems
KW - parallel programming
KW - program compilers
KW - Core i7/AMD Radeon 7970
KW - Core i7/NVIDIA GeForce GTX 580
KW - NAS parallel benchmark suite
KW - OpenCL code
KW - OpenMP code
KW - compiler
KW - data parallel program mapping
KW - data transformations
KW - data-parallel OpenMP programs
KW - general purpose GPU based systems
KW - heterogeneous multicores
KW - heterogeneous systems
KW - high level language
KW - predictive modeling
KW - GPU, OpenCL, Machine
KW - Learning Mapping
UR - http://www.scopus.com/inward/record.url?scp=84876937393&partnerID=8YFLogxK
U2 - 10.1109/CGO.2013.6494993
DO - 10.1109/CGO.2013.6494993
M3 - Conference contribution/Paper
SN - 978-1-4673-5524-7
SP - 1
EP - 10
BT - Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on
PB - IEEE
ER -