Rights statement: © Authors ACM, 2018. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in CF '18 Proceedings of the 15th ACM International Conference on Computing Frontiershttp://dx.doi.org/10.1145/3203217.3203244
Accepted author manuscript, 1.32 MB, PDF document
Available under license: CC BY-NC: Creative Commons Attribution-NonCommercial 4.0 International License
Final published version
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - MOCL
T2 - An Efficient OpenCL Implementation for the Matrix-2000 Architecture
AU - Zhang, Peng
AU - Fang, Jianbin
AU - Yang, Canqun
AU - Tang, Tao
AU - Huang, Chun
AU - Wang, Zheng
N1 - © Authors ACM, 2018. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in CF '18 Proceedings of the 15th ACM International Conference on Computing Frontiershttp://dx.doi.org/10.1145/3203217.3203244
PY - 2018/5/8
Y1 - 2018/5/8
N2 - This paper presents the design and implementation of an Open Computing Language (OpenCL) framework for the Matrix-2000 many-core architecture. This architecture is designed to replace the Intel XeonPhi accelerators of the TianHe-2 supercomputer. We share our experience and insights on how to design an effective OpenCL system for this new hardware accelerator. We propose a set of new analysis and optimizations to unlock the potential of the hardware. We extensively evaluate our approach using a wide range of OpenCL benchmarks on a single and multiple computing nodes. We present our design choices and provide guidance how to optimize code on the new Matrix-2000 architecture.
AB - This paper presents the design and implementation of an Open Computing Language (OpenCL) framework for the Matrix-2000 many-core architecture. This architecture is designed to replace the Intel XeonPhi accelerators of the TianHe-2 supercomputer. We share our experience and insights on how to design an effective OpenCL system for this new hardware accelerator. We propose a set of new analysis and optimizations to unlock the potential of the hardware. We extensively evaluate our approach using a wide range of OpenCL benchmarks on a single and multiple computing nodes. We present our design choices and provide guidance how to optimize code on the new Matrix-2000 architecture.
U2 - 10.1145/3203217.3203244
DO - 10.1145/3203217.3203244
M3 - Conference contribution/Paper
SN - 9781450357616
SP - 26
EP - 35
BT - CF '18 Proceedings of the 15th ACM International Conference on Computing Frontiers
PB - ACM
CY - New York
ER -