Home > Research > Publications & Outputs > Partitioning streaming parallelism for multi-co...
View graph of relations

Partitioning streaming parallelism for multi-cores: a machine learning based approach

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Partitioning streaming parallelism for multi-cores: a machine learning based approach. / Wang, Zheng; O'Boyle, Michael F.P.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT 2010). New York, NY, USA: ACM, 2010. p. 307-318.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Wang, Z & O'Boyle, MFP 2010, Partitioning streaming parallelism for multi-cores: a machine learning based approach. in Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT 2010). ACM, New York, NY, USA, pp. 307-318, PACT 2010 19th International Conference on Parallel Architectures and Compilation Techniques, Vienna, Austria, 11/09/10. https://doi.org/10.1145/1854273.1854313

APA

Wang, Z., & O'Boyle, M. F. P. (2010). Partitioning streaming parallelism for multi-cores: a machine learning based approach. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT 2010) (pp. 307-318). ACM. https://doi.org/10.1145/1854273.1854313

Vancouver

Wang Z, O'Boyle MFP. Partitioning streaming parallelism for multi-cores: a machine learning based approach. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT 2010). New York, NY, USA: ACM. 2010. p. 307-318 doi: 10.1145/1854273.1854313

Author

Wang, Zheng ; O'Boyle, Michael F.P. / Partitioning streaming parallelism for multi-cores: a machine learning based approach. Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT 2010). New York, NY, USA : ACM, 2010. pp. 307-318

Bibtex

@inproceedings{34274405a27d48ffb11729400328e824,
title = "Partitioning streaming parallelism for multi-cores: a machine learning based approach",
abstract = "Stream based languages are a popular approach to expressing parallelism in modern applications. The efficient mapping of streaming parallelism to multi-core processors is, however, highly dependent on the program and underlying architecture. We address this by developing a portable and automatic compiler-based approach to partitioning streaming programs using machine learning. Our technique predicts the ideal partition structure for a given streaming application using prior knowledge learned off-line. Using the predictor we rapidly search the program space (without executing any code) to generate and select a good partition. We applied this technique to standard StreamIt applications and compared against existing approaches. On a 4-core platform, our approach achieves 60% of the best performance found by iteratively compiling and executing over 3000 different partitions per program. We obtain, on average, a 1.90x speedup over the already tuned partitioning scheme of the StreamIt compiler. When compared against a state-of-the-art analytical, model-based approach, we achieve, on average, a 1.77x performance improvement. By porting our approach to a 8-core platform, we are able to obtain 1.8x improvement over the StreamIt default scheme, demonstrating the portability of our approach.",
keywords = "compiler optimization, machine learning, partitioning streaming parallelism",
author = "Zheng Wang and O'Boyle, {Michael F.P.}",
year = "2010",
doi = "10.1145/1854273.1854313",
language = "English",
isbn = "978-1-4503-0178-7 ",
pages = "307--318",
booktitle = "Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT 2010)",
publisher = "ACM",
note = "PACT 2010 19th International Conference on Parallel Architectures and Compilation Techniques ; Conference date: 11-09-2010 Through 15-09-2010",

}

RIS

TY - GEN

T1 - Partitioning streaming parallelism for multi-cores: a machine learning based approach

AU - Wang, Zheng

AU - O'Boyle, Michael F.P.

PY - 2010

Y1 - 2010

N2 - Stream based languages are a popular approach to expressing parallelism in modern applications. The efficient mapping of streaming parallelism to multi-core processors is, however, highly dependent on the program and underlying architecture. We address this by developing a portable and automatic compiler-based approach to partitioning streaming programs using machine learning. Our technique predicts the ideal partition structure for a given streaming application using prior knowledge learned off-line. Using the predictor we rapidly search the program space (without executing any code) to generate and select a good partition. We applied this technique to standard StreamIt applications and compared against existing approaches. On a 4-core platform, our approach achieves 60% of the best performance found by iteratively compiling and executing over 3000 different partitions per program. We obtain, on average, a 1.90x speedup over the already tuned partitioning scheme of the StreamIt compiler. When compared against a state-of-the-art analytical, model-based approach, we achieve, on average, a 1.77x performance improvement. By porting our approach to a 8-core platform, we are able to obtain 1.8x improvement over the StreamIt default scheme, demonstrating the portability of our approach.

AB - Stream based languages are a popular approach to expressing parallelism in modern applications. The efficient mapping of streaming parallelism to multi-core processors is, however, highly dependent on the program and underlying architecture. We address this by developing a portable and automatic compiler-based approach to partitioning streaming programs using machine learning. Our technique predicts the ideal partition structure for a given streaming application using prior knowledge learned off-line. Using the predictor we rapidly search the program space (without executing any code) to generate and select a good partition. We applied this technique to standard StreamIt applications and compared against existing approaches. On a 4-core platform, our approach achieves 60% of the best performance found by iteratively compiling and executing over 3000 different partitions per program. We obtain, on average, a 1.90x speedup over the already tuned partitioning scheme of the StreamIt compiler. When compared against a state-of-the-art analytical, model-based approach, we achieve, on average, a 1.77x performance improvement. By porting our approach to a 8-core platform, we are able to obtain 1.8x improvement over the StreamIt default scheme, demonstrating the portability of our approach.

KW - compiler optimization

KW - machine learning

KW - partitioning streaming parallelism

UR - http://www.scopus.com/inward/record.url?scp=78149235736&partnerID=8YFLogxK

U2 - 10.1145/1854273.1854313

DO - 10.1145/1854273.1854313

M3 - Conference contribution/Paper

SN - 978-1-4503-0178-7

SP - 307

EP - 318

BT - Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT 2010)

PB - ACM

CY - New York, NY, USA

T2 - PACT 2010 19th International Conference on Parallel Architectures and Compilation Techniques

Y2 - 11 September 2010 through 15 September 2010

ER -