Home > Research > Publications & Outputs > Auto-tuning MPI Collective Operations on Large-...

Electronic data

  • 1570538287

    Rights statement: ©2019 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

    Accepted author manuscript, 650 KB, PDF document

    Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License

Links

Text available via DOI:

View graph of relations

Auto-tuning MPI Collective Operations on Large-Scale Parallel Systems

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Auto-tuning MPI Collective Operations on Large-Scale Parallel Systems. / Zheng, Wenxu; Fang, Jianbin; Chen, Juan et al.
The 21st IEEE International Conference on High Performance Computing and Communications. IEEE, 2019. p. 670-677.

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Zheng, W, Fang, J, Chen, J, Pan, X, Wang, H, Huang, C, Sun, X, Tang, T & Wang, Z 2019, Auto-tuning MPI Collective Operations on Large-Scale Parallel Systems. in The 21st IEEE International Conference on High Performance Computing and Communications. IEEE, pp. 670-677. https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00101

APA

Zheng, W., Fang, J., Chen, J., Pan, X., Wang, H., Huang, C., Sun, X., Tang, T., & Wang, Z. (2019). Auto-tuning MPI Collective Operations on Large-Scale Parallel Systems. In The 21st IEEE International Conference on High Performance Computing and Communications (pp. 670-677). IEEE. https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00101

Vancouver

Zheng W, Fang J, Chen J, Pan X, Wang H, Huang C et al. Auto-tuning MPI Collective Operations on Large-Scale Parallel Systems. In The 21st IEEE International Conference on High Performance Computing and Communications. IEEE. 2019. p. 670-677 doi: 10.1109/HPCC/SmartCity/DSS.2019.00101

Author

Zheng, Wenxu ; Fang, Jianbin ; Chen, Juan et al. / Auto-tuning MPI Collective Operations on Large-Scale Parallel Systems. The 21st IEEE International Conference on High Performance Computing and Communications. IEEE, 2019. pp. 670-677

Bibtex

@inproceedings{4e89b99d9c974231adc3411956d4894b,
title = "Auto-tuning MPI Collective Operations on Large-Scale Parallel Systems",
abstract = "MPI libraries are widely used in applications of high performance computing. Yet, effective tuning of MPI colletives on large parallel systems is an outstanding challenge. This process often follows a trial-and-error approach and requires expert insights into the subtle interactions between software and the underlying hardware. This paper presents an empirical approach to choose and switch MPI communication algorithms at runtime to optimize the application performance. We achieve this by first modeling offline, through microbenchmarks, to find how the runtime parameters with different message sizes affect the choice of MPI communication algorithms. We then apply the knowledge to automatically optimize new unseen MPI programs. We evaluate our approach by applying it to NPB and HPCC benchmarks on a 384-node computer cluster of the Tianhe-2 supercomputer. Experimental results show that our approach achieves, on average, 22.7% (up to 40.7%) improvement over the default setting.",
author = "Wenxu Zheng and Jianbin Fang and Juan Chen and Xiaodong Pan and Hao Wang and Chun Huang and Xiaole Sun and Tao Tang and Zheng Wang",
note = "{\textcopyright}2019 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.",
year = "2019",
month = oct,
day = "3",
doi = "10.1109/HPCC/SmartCity/DSS.2019.00101",
language = "English",
isbn = "9781728120591",
pages = "670--677",
booktitle = "The 21st IEEE International Conference on High Performance Computing and Communications",
publisher = "IEEE",

}

RIS

TY - GEN

T1 - Auto-tuning MPI Collective Operations on Large-Scale Parallel Systems

AU - Zheng, Wenxu

AU - Fang, Jianbin

AU - Chen, Juan

AU - Pan, Xiaodong

AU - Wang, Hao

AU - Huang, Chun

AU - Sun, Xiaole

AU - Tang, Tao

AU - Wang, Zheng

N1 - ©2019 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

PY - 2019/10/3

Y1 - 2019/10/3

N2 - MPI libraries are widely used in applications of high performance computing. Yet, effective tuning of MPI colletives on large parallel systems is an outstanding challenge. This process often follows a trial-and-error approach and requires expert insights into the subtle interactions between software and the underlying hardware. This paper presents an empirical approach to choose and switch MPI communication algorithms at runtime to optimize the application performance. We achieve this by first modeling offline, through microbenchmarks, to find how the runtime parameters with different message sizes affect the choice of MPI communication algorithms. We then apply the knowledge to automatically optimize new unseen MPI programs. We evaluate our approach by applying it to NPB and HPCC benchmarks on a 384-node computer cluster of the Tianhe-2 supercomputer. Experimental results show that our approach achieves, on average, 22.7% (up to 40.7%) improvement over the default setting.

AB - MPI libraries are widely used in applications of high performance computing. Yet, effective tuning of MPI colletives on large parallel systems is an outstanding challenge. This process often follows a trial-and-error approach and requires expert insights into the subtle interactions between software and the underlying hardware. This paper presents an empirical approach to choose and switch MPI communication algorithms at runtime to optimize the application performance. We achieve this by first modeling offline, through microbenchmarks, to find how the runtime parameters with different message sizes affect the choice of MPI communication algorithms. We then apply the knowledge to automatically optimize new unseen MPI programs. We evaluate our approach by applying it to NPB and HPCC benchmarks on a 384-node computer cluster of the Tianhe-2 supercomputer. Experimental results show that our approach achieves, on average, 22.7% (up to 40.7%) improvement over the default setting.

U2 - 10.1109/HPCC/SmartCity/DSS.2019.00101

DO - 10.1109/HPCC/SmartCity/DSS.2019.00101

M3 - Conference contribution/Paper

SN - 9781728120591

SP - 670

EP - 677

BT - The 21st IEEE International Conference on High Performance Computing and Communications

PB - IEEE

ER -