Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSN › Conference contribution/Paper › peer-review
}
TY - GEN
T1 - SuperCache
T2 - A mechanism to minimize the front end latency
AU - Allan, Zhang
AU - Helal, Sumi
PY - 2007
Y1 - 2007
N2 - Modern CPU's pipeline stages can be roughly classified as front end and back end stages. Front end supplies ready (decoded, renamed) instructions and dispatches them to reservation stations where back end issues, executes and retires them. The lengthy front end stages, including instruction fetching, decoding, renaming and dispatching, play a key role in overall performance: only adequate ready instruction supply can make room for back end stages to fully exploit instruction level parallelism (ILP). The front end latency reduction is especially critical for recent deeply pipelined architecture where the front end is especially long: instruction cache access may take more than one cycle even for cache hit, let alone cache miss. In case of branch mis-prediction, the supply/demand equilibrium between front end and back end is suddenly disrupted, back end often under-utilizes available resources during the long waiting period until front end can supply new branch of instructions ready in reservation stations. In this paper, we introduce and evaluate a new mechanism (called SuperCache) that aims to reduce the front end latency by enhancing the traditional reservation pool to a SuperCache and recycle retired reservation stations. With the employment of the proposed mechanism, we can see a significant performance improvement by up to 15% even 30% in our simulations. © 2007 IEEE.
AB - Modern CPU's pipeline stages can be roughly classified as front end and back end stages. Front end supplies ready (decoded, renamed) instructions and dispatches them to reservation stations where back end issues, executes and retires them. The lengthy front end stages, including instruction fetching, decoding, renaming and dispatching, play a key role in overall performance: only adequate ready instruction supply can make room for back end stages to fully exploit instruction level parallelism (ILP). The front end latency reduction is especially critical for recent deeply pipelined architecture where the front end is especially long: instruction cache access may take more than one cycle even for cache hit, let alone cache miss. In case of branch mis-prediction, the supply/demand equilibrium between front end and back end is suddenly disrupted, back end often under-utilizes available resources during the long waiting period until front end can supply new branch of instructions ready in reservation stations. In this paper, we introduce and evaluate a new mechanism (called SuperCache) that aims to reduce the front end latency by enhancing the traditional reservation pool to a SuperCache and recycle retired reservation stations. With the employment of the proposed mechanism, we can see a significant performance improvement by up to 15% even 30% in our simulations. © 2007 IEEE.
KW - Front end
KW - Instruction level parallelism
KW - Latency reduction
KW - Pipeline
KW - Superscalar
KW - Classification (of information)
KW - Computer simulation
KW - Decoding
KW - Instruction level parallelism (ILP)
KW - Buffer storage
U2 - 10.1109/ITNG.2007.189
DO - 10.1109/ITNG.2007.189
M3 - Conference contribution/Paper
SN - 0769527760
SP - 908
EP - 914
BT - Information Technology, 2007. ITNG '07. Fourth International Conference on
PB - IEEE
ER -