Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems

Computing and Communications

Associated organisational units

Text available via DOI:

https://doi.org/10.1145/2677036
Final published version

View graph of relations

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Published

Standard

Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems. / Wang, Zheng; Grewe, Dominik ; O'Boyle, Michael.
In: ACM Transactions on Architecture and Code Optimization, Vol. 11, No. 4, 42, 01.2015.

Research output: Contribution to Journal/Magazine › Journal article › peer-review

Harvard

Wang, Z, Grewe, D & O'Boyle, M 2015, 'Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems', ACM Transactions on Architecture and Code Optimization, vol. 11, no. 4, 42. https://doi.org/10.1145/2677036

APA

Wang, Z., Grewe, D., & O'Boyle, M. (2015). Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems. ACM Transactions on Architecture and Code Optimization, 11(4), Article 42. https://doi.org/10.1145/2677036

Vancouver

Wang Z, Grewe D, O'Boyle M. Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems. ACM Transactions on Architecture and Code Optimization. 2015 Jan;11(4):42. doi: 10.1145/2677036

Author

Wang, Zheng ; Grewe, Dominik ; O'Boyle, Michael. / Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems. In: ACM Transactions on Architecture and Code Optimization. 2015 ; Vol. 11, No. 4.

Bibtex

@article{78305a1e63f84c4795ddd8c9f174f37c,

title = "Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems",

abstract = "General-purpose GPU-based systems are highly attractive, as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler-based approach to automatically generate optimized OpenCL code from data parallel OpenMP programs for GPUs. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses automatic machine learning to build a predictive model to determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multicore host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on distinct GPU-based systems. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) on Core i7/NVIDIA GeForce GTX580 and Core i7/AMD Radeon 7970 platforms, respectively, over a sequential baseline. Our approach achieves, on average, greater than 10× speedups over two state-of-the-art automatic GPU code generators.",

author = "Zheng Wang and Dominik Grewe and Michael O'Boyle",

year = "2015",

month = jan,

doi = "10.1145/2677036",

language = "English",

volume = "11",

journal = "ACM Transactions on Architecture and Code Optimization",

issn = "1544-3566",

publisher = "Association for Computing Machinery (ACM)",

number = "4",

}

RIS

TY - JOUR

T1 - Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems

AU - Wang, Zheng

AU - Grewe, Dominik

AU - O'Boyle, Michael

PY - 2015/1

Y1 - 2015/1

N2 - General-purpose GPU-based systems are highly attractive, as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler-based approach to automatically generate optimized OpenCL code from data parallel OpenMP programs for GPUs. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses automatic machine learning to build a predictive model to determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multicore host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on distinct GPU-based systems. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) on Core i7/NVIDIA GeForce GTX580 and Core i7/AMD Radeon 7970 platforms, respectively, over a sequential baseline. Our approach achieves, on average, greater than 10× speedups over two state-of-the-art automatic GPU code generators.

AB - General-purpose GPU-based systems are highly attractive, as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler-based approach to automatically generate optimized OpenCL code from data parallel OpenMP programs for GPUs. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses automatic machine learning to build a predictive model to determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multicore host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on distinct GPU-based systems. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) on Core i7/NVIDIA GeForce GTX580 and Core i7/AMD Radeon 7970 platforms, respectively, over a sequential baseline. Our approach achieves, on average, greater than 10× speedups over two state-of-the-art automatic GPU code generators.

U2 - 10.1145/2677036

DO - 10.1145/2677036

M3 - Journal article

VL - 11

JO - ACM Transactions on Architecture and Code Optimization

JF - ACM Transactions on Architecture and Code Optimization

SN - 1544-3566

IS - 4

M1 - 42

ER -

Research

Associated organisational units

Links

Text available via DOI: