Partitioning streaming parallelism for multi-cores: a machine learning based approach

Computing and Communications

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review

Published

Zheng Wang
Michael F.P. O'Boyle

More...

Publication date	2010
Host publication	Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT 2010)
Place of Publication	New York, NY, USA
Publisher	ACM
Pages	307-318
Number of pages	12
ISBN (print)	978-1-4503-0178-7
<mark>Original language</mark>	English
Event	PACT 2010 19th International Conference on Parallel Architectures and Compilation Techniques - Vienna, Austria Duration: 11/09/2010 → 15/09/2010

Conference

Conference	PACT 2010 19th International Conference on Parallel Architectures and Compilation Techniques
Country/Territory	Austria
City	Vienna
Period	11/09/10 → 15/09/10

Conference

Conference	PACT 2010 19th International Conference on Parallel Architectures and Compilation Techniques
Country/Territory	Austria
City	Vienna
Period	11/09/10 → 15/09/10

Abstract

Stream based languages are a popular approach to expressing parallelism in modern applications. The efficient mapping of streaming parallelism to multi-core processors is, however, highly dependent on the program and underlying architecture. We address this by developing a portable and automatic compiler-based approach to partitioning streaming programs using machine learning. Our technique predicts the ideal partition structure for a given streaming application using prior knowledge learned off-line. Using the predictor we rapidly search the program space (without executing any code) to generate and select a good partition. We applied this technique to standard StreamIt applications and compared against existing approaches. On a 4-core platform, our approach achieves 60% of the best performance found by iteratively compiling and executing over 3000 different partitions per program. We obtain, on average, a 1.90x speedup over the already tuned partitioning scheme of the StreamIt compiler. When compared against a state-of-the-art analytical, model-based approach, we achieve, on average, a 1.77x performance improvement. By porting our approach to a 8-core platform, we are able to obtain 1.8x improvement over the StreamIt default scheme, demonstrating the portability of our approach.

Research

Associated organisational unit

Links

Text available via DOI:

Keywords