Mapping parallelism to multi-cores: a machine learning based approach

Computing and Communications

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Zheng Wang
Michael F.P. O'Boyle

More...

Publication date	2009
Host publication	Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP '09)
Place of Publication	New York, NY, USA
Publisher	ACM
Pages	75-84
Number of pages	10
ISBN (print)	978-1-60558-397-6
<mark>Original language</mark>	English
Event	PPoPP 2009 14th ACM SIGPLAN symposium on Principles and practice of parallel programming - Raleigh, N.C., United States Duration: 14/02/2009 → 18/02/2009

Conference

Conference	PPoPP 2009 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Country/Territory	United States
City	Raleigh, N.C.
Period	14/02/09 → 18/02/09

Conference

Conference	PPoPP 2009 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Country/Territory	United States
City	Raleigh, N.C.
Period	14/02/09 → 18/02/09

Abstract

The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-based approach to mapping such parallelism using machine learning. It develops two predictors: a data sensitive and a data insensitive predictor to select the best mapping for parallel programs. They predict the number of threads and the scheduling policy for any given program using a model learnt off-line. By using low-cost profiling runs, they predict the mapping for a new unseen program across multiple input data sets. We evaluate our approach by selecting parallelism mapping configurations for OpenMP programs on two representative but different multi-core platforms (the Intel Xeon and the Cell processors). Performance of our technique is stable across programs and architectures. On average, it delivers above 96% performance of the maximum available on both platforms. It achieve, on average, a 37% (up to 17.5 times) performance improvement over the OpenMP runtime default scheme on the Cell platform. Compared to two recent prediction models, our predictors achieve better performance with a significant lower profiling cost.

Research

Associated organisational unit

Links

Text available via DOI:

Keywords