12,000

We have over 12,000 students, from over 100 countries, within one of the safest campuses in the UK

93%

93% of Lancaster students go into work or further study within six months of graduating

Home > Research > Publications & Outputs > Mapping parallelism to multi-cores: a machine l...
View graph of relations

« Back

Mapping parallelism to multi-cores: a machine learning based approach

Research output: Contribution in Book/Report/ProceedingsPaper

Published

Publication date2009
Host publicationProceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP '09)
Place of publicationNew York, NY, USA
PublisherACM
Pages75-84
Number of pages10
ISBN (Print)978-1-60558-397-6
Original languageEnglish

Conference

ConferencePPoPP 2009 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
CountryUnited States
CityRaleigh, N.C.
Period14/02/0918/02/09

Conference

ConferencePPoPP 2009 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
CountryUnited States
CityRaleigh, N.C.
Period14/02/0918/02/09

Abstract

The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-based approach to mapping such parallelism using machine learning. It develops two predictors: a data sensitive and a data insensitive predictor to select the best mapping for parallel programs. They predict the number of threads and the scheduling policy for any given program using a model learnt off-line. By using low-cost profiling runs, they predict the mapping for a new unseen program across multiple input data sets. We evaluate our approach by selecting parallelism mapping configurations for OpenMP programs on two representative but different multi-core platforms (the Intel Xeon and the Cell processors). Performance of our technique is stable across programs and architectures. On average, it delivers above 96% performance of the maximum available on both platforms. It achieve, on average, a 37% (up to 17.5 times) performance improvement over the OpenMP runtime default scheme on the Cell platform. Compared to two recent prediction models, our predictors achieve better performance with a significant lower profiling cost.