Home > Research > Publications & Outputs > Optimizing Sparse Matrix-Vector Multiplications...

Electronic data

  • aspmv

    Rights statement: The final publication is available at Springer via http://dx.doi.org/10.1007/s10766-018-00625-8

    Accepted author manuscript, 1 MB, PDF document

    Available under license: CC BY: Creative Commons Attribution 4.0 International License

Links

Text available via DOI:

View graph of relations

Optimizing Sparse Matrix-Vector Multiplications on an ARMv8-based Many-Core Architecture

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

Optimizing Sparse Matrix-Vector Multiplications on an ARMv8-based Many-Core Architecture. / Chen, Donglin; Fang, Jianbin; Chen, Shizhao et al.
In: International Journal of Parallel Programming, Vol. 47, No. 3, 01.06.2019, p. 418-432.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Chen, D, Fang, J, Chen, S, Xu, C & Wang, Z 2019, 'Optimizing Sparse Matrix-Vector Multiplications on an ARMv8-based Many-Core Architecture', International Journal of Parallel Programming, vol. 47, no. 3, pp. 418-432. https://doi.org/10.1007/s10766-018-00625-8

APA

Chen, D., Fang, J., Chen, S., Xu, C., & Wang, Z. (2019). Optimizing Sparse Matrix-Vector Multiplications on an ARMv8-based Many-Core Architecture. International Journal of Parallel Programming, 47(3), 418-432. https://doi.org/10.1007/s10766-018-00625-8

Vancouver

Chen D, Fang J, Chen S, Xu C, Wang Z. Optimizing Sparse Matrix-Vector Multiplications on an ARMv8-based Many-Core Architecture. International Journal of Parallel Programming. 2019 Jun 1;47(3):418-432. Epub 2019 Jan 1. doi: 10.1007/s10766-018-00625-8

Author

Chen, Donglin ; Fang, Jianbin ; Chen, Shizhao et al. / Optimizing Sparse Matrix-Vector Multiplications on an ARMv8-based Many-Core Architecture. In: International Journal of Parallel Programming. 2019 ; Vol. 47, No. 3. pp. 418-432.

Bibtex

@article{a0c29c717e3249fd96ba18fbf717f7a2,
title = "Optimizing Sparse Matrix-Vector Multiplications on an ARMv8-based Many-Core Architecture",
abstract = "Sparse matrix–vector multiplications (SpMV) are common in scientific and HPC applications but are hard to be optimized. While the ARMv8-based processor IP is emerging as an alternative to the traditional x64 HPC processor design, there is little study on SpMV performance on such new many-cores. To design efficient HPC software and hardware, we need to understand how well SpMV performs. This work develops a quantitative approach to characterize SpMV performance on a recent ARMv8-based many-core architecture, Phytium FT-2000 Plus (FTP). We perform extensive experiments involved over 9500 distinct profiling runs on 956 sparse datasets and five mainstream sparse matrix storage formats, and compare FTP against the Intel Knights Landing many-core. We experimentally show that picking the optimal sparse matrix storage format and parameters is non-trivial as the correct decision requires expert knowledge of the input matrix and the hardware. We address the problem by proposing a machine learning based model that predicts the best storage format and parameters using input matrix features. The model automatically specializes to the many-core architectures we considered. The experimental results show that our approach achieves on average 93% of the best-available performance without incurring runtime profiling overhead.",
keywords = "SpMV , Sparse matrix format , Many-Core, Performance tuning",
author = "Donglin Chen and Jianbin Fang and Shizhao Chen and Chuanfu Xu and Zheng Wang",
note = "The final publication is available at Springer via http://dx.doi.org/10.1007/s10766-018-00625-8",
year = "2019",
month = jun,
day = "1",
doi = "10.1007/s10766-018-00625-8",
language = "English",
volume = "47",
pages = "418--432",
journal = " International Journal of Parallel Programming",
issn = "0885-7458",
publisher = "Springer",
number = "3",

}

RIS

TY - JOUR

T1 - Optimizing Sparse Matrix-Vector Multiplications on an ARMv8-based Many-Core Architecture

AU - Chen, Donglin

AU - Fang, Jianbin

AU - Chen, Shizhao

AU - Xu, Chuanfu

AU - Wang, Zheng

N1 - The final publication is available at Springer via http://dx.doi.org/10.1007/s10766-018-00625-8

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Sparse matrix–vector multiplications (SpMV) are common in scientific and HPC applications but are hard to be optimized. While the ARMv8-based processor IP is emerging as an alternative to the traditional x64 HPC processor design, there is little study on SpMV performance on such new many-cores. To design efficient HPC software and hardware, we need to understand how well SpMV performs. This work develops a quantitative approach to characterize SpMV performance on a recent ARMv8-based many-core architecture, Phytium FT-2000 Plus (FTP). We perform extensive experiments involved over 9500 distinct profiling runs on 956 sparse datasets and five mainstream sparse matrix storage formats, and compare FTP against the Intel Knights Landing many-core. We experimentally show that picking the optimal sparse matrix storage format and parameters is non-trivial as the correct decision requires expert knowledge of the input matrix and the hardware. We address the problem by proposing a machine learning based model that predicts the best storage format and parameters using input matrix features. The model automatically specializes to the many-core architectures we considered. The experimental results show that our approach achieves on average 93% of the best-available performance without incurring runtime profiling overhead.

AB - Sparse matrix–vector multiplications (SpMV) are common in scientific and HPC applications but are hard to be optimized. While the ARMv8-based processor IP is emerging as an alternative to the traditional x64 HPC processor design, there is little study on SpMV performance on such new many-cores. To design efficient HPC software and hardware, we need to understand how well SpMV performs. This work develops a quantitative approach to characterize SpMV performance on a recent ARMv8-based many-core architecture, Phytium FT-2000 Plus (FTP). We perform extensive experiments involved over 9500 distinct profiling runs on 956 sparse datasets and five mainstream sparse matrix storage formats, and compare FTP against the Intel Knights Landing many-core. We experimentally show that picking the optimal sparse matrix storage format and parameters is non-trivial as the correct decision requires expert knowledge of the input matrix and the hardware. We address the problem by proposing a machine learning based model that predicts the best storage format and parameters using input matrix features. The model automatically specializes to the many-core architectures we considered. The experimental results show that our approach achieves on average 93% of the best-available performance without incurring runtime profiling overhead.

KW - SpMV

KW - Sparse matrix format

KW - Many-Core

KW - Performance tuning

U2 - 10.1007/s10766-018-00625-8

DO - 10.1007/s10766-018-00625-8

M3 - Journal article

VL - 47

SP - 418

EP - 432

JO - International Journal of Parallel Programming

JF - International Journal of Parallel Programming

SN - 0885-7458

IS - 3

ER -