Home > Research > Publications & Outputs > Using machine learning to partition streaming p...
View graph of relations

Using machine learning to partition streaming programs

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Using machine learning to partition streaming programs. / Wang, Zheng; O'Boyle, Michael.
In: ACM Transactions on Architecture and Code Optimization, Vol. 10, No. 3, 20, 09.2013.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Wang, Z & O'Boyle, M 2013, 'Using machine learning to partition streaming programs', ACM Transactions on Architecture and Code Optimization, vol. 10, no. 3, 20. https://doi.org/10.1145/2512436

APA

Wang, Z., & O'Boyle, M. (2013). Using machine learning to partition streaming programs. ACM Transactions on Architecture and Code Optimization, 10(3), Article 20. https://doi.org/10.1145/2512436

Vancouver

Wang Z, O'Boyle M. Using machine learning to partition streaming programs. ACM Transactions on Architecture and Code Optimization. 2013 Sept;10(3):20. doi: 10.1145/2512436

Author

Wang, Zheng ; O'Boyle, Michael. / Using machine learning to partition streaming programs. In: ACM Transactions on Architecture and Code Optimization. 2013 ; Vol. 10, No. 3.

Bibtex

@article{1c91079a01c146db886cf262b0be33ed,
title = "Using machine learning to partition streaming programs",
abstract = "Stream-based parallel languages are a popular way to express parallelism in modern applications. The efficient mapping of streaming parallelism to today's multicore systems is, however, highly dependent on the program and underlying architecture. We address this by developing a portable and automatic compiler-based approach to partitioning streaming programs using machine learning. Our technique predicts the ideal partition structure for a given streaming application using prior knowledge learned offline. Using the predictor we rapidly search the program space (without executing any code) to generate and select a good partition. We applied this technique to standard StreamIt applications and compared against existing approaches. On a 4-core platform, our approach achieves 60% of the best performance found by iteratively compiling and executing over 3000 different partitions per program. We obtain, on average, a 1.90× speedup over the already tuned partitioning scheme of the StreamIt compiler. When compared against a state-of-the-art analytical, model-based approach, we achieve, on average, a 1.77× performance improvement. By porting our approach to an 8-core platform, we are able to obtain 1.8× improvement over the StreamIt default scheme, demonstrating the portability of our approach.",
author = "Zheng Wang and Michael O'Boyle",
year = "2013",
month = sep,
doi = "10.1145/2512436",
language = "English",
volume = "10",
journal = "ACM Transactions on Architecture and Code Optimization",
issn = "1544-3973",
publisher = "Association for Computing Machinery (ACM)",
number = "3",

}

RIS

TY - JOUR

T1 - Using machine learning to partition streaming programs

AU - Wang, Zheng

AU - O'Boyle, Michael

PY - 2013/9

Y1 - 2013/9

N2 - Stream-based parallel languages are a popular way to express parallelism in modern applications. The efficient mapping of streaming parallelism to today's multicore systems is, however, highly dependent on the program and underlying architecture. We address this by developing a portable and automatic compiler-based approach to partitioning streaming programs using machine learning. Our technique predicts the ideal partition structure for a given streaming application using prior knowledge learned offline. Using the predictor we rapidly search the program space (without executing any code) to generate and select a good partition. We applied this technique to standard StreamIt applications and compared against existing approaches. On a 4-core platform, our approach achieves 60% of the best performance found by iteratively compiling and executing over 3000 different partitions per program. We obtain, on average, a 1.90× speedup over the already tuned partitioning scheme of the StreamIt compiler. When compared against a state-of-the-art analytical, model-based approach, we achieve, on average, a 1.77× performance improvement. By porting our approach to an 8-core platform, we are able to obtain 1.8× improvement over the StreamIt default scheme, demonstrating the portability of our approach.

AB - Stream-based parallel languages are a popular way to express parallelism in modern applications. The efficient mapping of streaming parallelism to today's multicore systems is, however, highly dependent on the program and underlying architecture. We address this by developing a portable and automatic compiler-based approach to partitioning streaming programs using machine learning. Our technique predicts the ideal partition structure for a given streaming application using prior knowledge learned offline. Using the predictor we rapidly search the program space (without executing any code) to generate and select a good partition. We applied this technique to standard StreamIt applications and compared against existing approaches. On a 4-core platform, our approach achieves 60% of the best performance found by iteratively compiling and executing over 3000 different partitions per program. We obtain, on average, a 1.90× speedup over the already tuned partitioning scheme of the StreamIt compiler. When compared against a state-of-the-art analytical, model-based approach, we achieve, on average, a 1.77× performance improvement. By porting our approach to an 8-core platform, we are able to obtain 1.8× improvement over the StreamIt default scheme, demonstrating the portability of our approach.

U2 - 10.1145/2512436

DO - 10.1145/2512436

M3 - Journal article

VL - 10

JO - ACM Transactions on Architecture and Code Optimization

JF - ACM Transactions on Architecture and Code Optimization

SN - 1544-3973

IS - 3

M1 - 20

ER -