Exploitation of GPUs for the parallelisation of probably parallel legacy code

Computing and Communications

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Zheng Wang
Daniel Powell
Bjorn Franke
Michael O'Boyle

More...

Publication date	2014
Host publication	Compiler construction: 23rd International Conference, CC 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014. Proceedings
Editors	Albert Cohen
Place of Publication	Berlin
Publisher	Springer Verlag
Pages	154-173
Number of pages	20
ISBN (electronic)	9783642548079
ISBN (print)	9783642548062
<mark>Original language</mark>	English

Publication series

Name	Lecture Notes in Computer Science
Publisher	Springer
Volume	8409
ISSN (Print)	0302-9743

Abstract

General purpose Gpus provide massive compute power, but are notoriously difficult to program. In this paper we present a complete compilation strategy to exploit Gpus for the parallelisation of sequential legacy code. Using hybrid data dependence analysis combining static and dynamic information, our compiler automatically detects suitable parallelism and generates parallel OpenCl code from sequential programs. We exploit the fact that dependence profiling provides us with parallel loop candidates that are highly likely to be genuinely parallel, but cannot be statically proven so. For the efficient Gpu parallelisation of those probably parallel loop candidates, we propose a novel software speculation scheme, which ensures correctness for the unlikely, yet possible case of dynamically detected dependence violations. Our scheme operates in place and supports speculative read and write operations. We demonstrate the effectiveness of our approach in detecting and exploiting parallelism using sequential codes from the Nas benchmark suite. We achieve an average speedup of 3.2x, and up to 99x, over the sequential baseline. On average, this is 1.42 times faster than state-of-the-art speculation schemes and corresponds to 99% of the performance level of a manual Gpu implementation developed by independent expert programmers.

Research

Associated organisational unit

Links

Text available via DOI:

Keywords