Home > Research > Publications & Outputs > Partitioning streaming parallelism for multi-co...
View graph of relations

Partitioning streaming parallelism for multi-cores: a machine learning based approach

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published
Close
Publication date2010
Host publicationProceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT 2010)
Place of PublicationNew York, NY, USA
PublisherACM
Pages307-318
Number of pages12
ISBN (print)978-1-4503-0178-7
<mark>Original language</mark>English
EventPACT 2010 19th International Conference on Parallel Architectures and Compilation Techniques - Vienna, Austria
Duration: 11/09/201015/09/2010

Conference

ConferencePACT 2010 19th International Conference on Parallel Architectures and Compilation Techniques
Country/TerritoryAustria
CityVienna
Period11/09/1015/09/10

Conference

ConferencePACT 2010 19th International Conference on Parallel Architectures and Compilation Techniques
Country/TerritoryAustria
CityVienna
Period11/09/1015/09/10

Abstract

Stream based languages are a popular approach to expressing parallelism in modern applications. The efficient mapping of streaming parallelism to multi-core processors is, however, highly dependent on the program and underlying architecture. We address this by developing a portable and automatic compiler-based approach to partitioning streaming programs using machine learning. Our technique predicts the ideal partition structure for a given streaming application using prior knowledge learned off-line. Using the predictor we rapidly search the program space (without executing any code) to generate and select a good partition. We applied this technique to standard StreamIt applications and compared against existing approaches. On a 4-core platform, our approach achieves 60% of the best performance found by iteratively compiling and executing over 3000 different partitions per program. We obtain, on average, a 1.90x speedup over the already tuned partitioning scheme of the StreamIt compiler. When compared against a state-of-the-art analytical, model-based approach, we achieve, on average, a 1.77x performance improvement. By porting our approach to a 8-core platform, we are able to obtain 1.8x improvement over the StreamIt default scheme, demonstrating the portability of our approach.