Mapping parallelism to multi-cores: a machine learning based approach

Computing and Communications

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Standard

Mapping parallelism to multi-cores: a machine learning based approach. / Wang, Zheng; O'Boyle, Michael F.P.
Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP '09). New York, NY, USA: ACM, 2009. p. 75-84.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Harvard

Wang, Z & O'Boyle, MFP 2009, Mapping parallelism to multi-cores: a machine learning based approach. in Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP '09). ACM, New York, NY, USA, pp. 75-84, PPoPP 2009 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, Raleigh, N.C., United States, 14/02/09. https://doi.org/10.1145/1504176.1504189

APA

Wang, Z., & O'Boyle, M. F. P. (2009). Mapping parallelism to multi-cores: a machine learning based approach. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP '09) (pp. 75-84). ACM. https://doi.org/10.1145/1504176.1504189

Vancouver

Wang Z, O'Boyle MFP. Mapping parallelism to multi-cores: a machine learning based approach. In Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP '09). New York, NY, USA: ACM. 2009. p. 75-84 doi: 10.1145/1504176.1504189

Author

Wang, Zheng ; O'Boyle, Michael F.P. / Mapping parallelism to multi-cores: a machine learning based approach. Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP '09). New York, NY, USA : ACM, 2009. pp. 75-84

Bibtex

@inproceedings{9731a359ec9647cb8f97f0425d35a82d,

title = "Mapping parallelism to multi-cores: a machine learning based approach",

abstract = "The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-based approach to mapping such parallelism using machine learning. It develops two predictors: a data sensitive and a data insensitive predictor to select the best mapping for parallel programs. They predict the number of threads and the scheduling policy for any given program using a model learnt off-line. By using low-cost profiling runs, they predict the mapping for a new unseen program across multiple input data sets. We evaluate our approach by selecting parallelism mapping configurations for OpenMP programs on two representative but different multi-core platforms (the Intel Xeon and the Cell processors). Performance of our technique is stable across programs and architectures. On average, it delivers above 96% performance of the maximum available on both platforms. It achieve, on average, a 37% (up to 17.5 times) performance improvement over the OpenMP runtime default scheme on the Cell platform. Compared to two recent prediction models, our predictors achieve better performance with a significant lower profiling cost.",

keywords = "artificial neural networks, compiler optimization , machine learning, performance modeling , support vector machine",

author = "Zheng Wang and O'Boyle, {Michael F.P.}",

year = "2009",

doi = "10.1145/1504176.1504189",

language = "English",

isbn = "978-1-60558-397-6",

pages = "75--84",

booktitle = "Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP '09)",

publisher = "ACM",

note = "PPoPP 2009 14th ACM SIGPLAN symposium on Principles and practice of parallel programming ; Conference date: 14-02-2009 Through 18-02-2009",

}

RIS

TY - GEN

T1 - Mapping parallelism to multi-cores: a machine learning based approach

AU - Wang, Zheng

AU - O'Boyle, Michael F.P.

PY - 2009

Y1 - 2009

N2 - The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-based approach to mapping such parallelism using machine learning. It develops two predictors: a data sensitive and a data insensitive predictor to select the best mapping for parallel programs. They predict the number of threads and the scheduling policy for any given program using a model learnt off-line. By using low-cost profiling runs, they predict the mapping for a new unseen program across multiple input data sets. We evaluate our approach by selecting parallelism mapping configurations for OpenMP programs on two representative but different multi-core platforms (the Intel Xeon and the Cell processors). Performance of our technique is stable across programs and architectures. On average, it delivers above 96% performance of the maximum available on both platforms. It achieve, on average, a 37% (up to 17.5 times) performance improvement over the OpenMP runtime default scheme on the Cell platform. Compared to two recent prediction models, our predictors achieve better performance with a significant lower profiling cost.

AB - The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-based approach to mapping such parallelism using machine learning. It develops two predictors: a data sensitive and a data insensitive predictor to select the best mapping for parallel programs. They predict the number of threads and the scheduling policy for any given program using a model learnt off-line. By using low-cost profiling runs, they predict the mapping for a new unseen program across multiple input data sets. We evaluate our approach by selecting parallelism mapping configurations for OpenMP programs on two representative but different multi-core platforms (the Intel Xeon and the Cell processors). Performance of our technique is stable across programs and architectures. On average, it delivers above 96% performance of the maximum available on both platforms. It achieve, on average, a 37% (up to 17.5 times) performance improvement over the OpenMP runtime default scheme on the Cell platform. Compared to two recent prediction models, our predictors achieve better performance with a significant lower profiling cost.

KW - artificial neural networks

KW - compiler optimization

KW - machine learning

KW - performance modeling

KW - support vector machine

UR - http://www.scopus.com/inward/record.url?scp=67650088253&partnerID=8YFLogxK

U2 - 10.1145/1504176.1504189

DO - 10.1145/1504176.1504189

M3 - Conference contribution/Paper

SN - 978-1-60558-397-6

SP - 75

EP - 84

BT - Proceedings of the 14th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP '09)

PB - ACM

CY - New York, NY, USA

T2 - PPoPP 2009 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

Y2 - 14 February 2009 through 18 February 2009

ER -

Research

Associated organisational unit

Links

Text available via DOI:

Keywords