Home > Research > Publications & Outputs > Exploitation of GPUs for the parallelisation of...
View graph of relations

Exploitation of GPUs for the parallelisation of probably parallel legacy code

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
  • Zheng Wang
  • Daniel Powell
  • Bjorn Franke
  • Michael O'Boyle
Close
Publication date2014
Host publicationCompiler construction: 23rd International Conference, CC 2014, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2014, Grenoble, France, April 5-13, 2014. Proceedings
EditorsAlbert Cohen
Place of PublicationBerlin
PublisherSpringer Verlag
Pages154-173
Number of pages20
ISBN (electronic)9783642548079
ISBN (print)9783642548062
<mark>Original language</mark>English

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume8409
ISSN (Print)0302-9743

Abstract

General purpose Gpus provide massive compute power, but are notoriously difficult to program. In this paper we present a complete compilation strategy to exploit Gpus for the parallelisation of sequential legacy code. Using hybrid data dependence analysis combining static and dynamic information, our compiler automatically detects suitable parallelism and generates parallel OpenCl code from sequential programs. We exploit the fact that dependence profiling provides us with parallel loop candidates that are highly likely to be genuinely parallel, but cannot be statically proven so. For the efficient Gpu parallelisation of those probably parallel loop candidates, we propose a novel software speculation scheme, which ensures correctness for the unlikely, yet possible case of dynamically detected dependence violations. Our scheme operates in place and supports speculative read and write operations. We demonstrate the effectiveness of our approach in detecting and exploiting parallelism using sequential codes from the Nas benchmark suite. We achieve an average speedup of 3.2x, and up to 99x, over the sequential baseline. On average, this is 1.42 times faster than state-of-the-art speculation schemes and corresponds to 99% of the performance level of a manual Gpu implementation developed by independent expert programmers.