Designing and Optimizing the Fetch Unit for a RISC Core

Authors

Computer Engineering and Information Technology Department, Amirkabir University of Technology, Tehran, Iran

Abstract

Despite the extensive deployment of multi-core architectures in the past few years, the design and optimization of each single processing core is still a fresh field in computing .On the other hand, having a design procedure (used to solve the problems related to the design of a single processing core )makes it possible to apply the proposed solutions to specific-purpose processing cores .The instruction fetch, which is one of the parts of the architectural design, is considered to have the greatest effect on the performance .RISC processors, which have architecture with a high capability for parallelism, need a high instruction width in order to reach an appropriate performance .Accurate branch prediction and low cache miss rate are two effective factors in the operation of the fetching unit .In this paper, we have designed and analyzed the fetching unit for a 4-way( 4-issue )superscalar processing core .We have applied the cost per performance design style and quantitative approach to propose this fetch unit .Moreover, timing constrains are specially analyzed for instruction cache to enable the proposed fetch unit to be in a superpipeline system .In order to solve the timing problem, we have applied the division method to the branch prediction tables and the wave pipelining technique to the instruction cache.

Keywords


References
[1] D. Geer, Chip makers turn to multicore processors, Industry Trends, IEEE Computer Society, pp. 11-13, 2005.
[2] J. Held, J. Bautista and S. Koehl, From a few cores to many:a tera-scale computing research overview, Intel White Paper, 2006.
[3] J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer and D. Shippy, Introduction to the cell multiprocessors, IBM Journal of Reserch and Developments, Vol.49, No.4/5, July/Sept. 2005.
[4] J. L. Hennessy and D. A. Patterson, Computer architecture: a quantitative approach, Morgan-Kaufmann, 4th edition, 2006.
[5] S.D. Wallace, Scalable Hardware Mechanism for superscalar processors, Ph.D. dissertation, University of California, Irvine, 1997. [6] M.K. Akbari, M. Shojaei, O. Aghalatifi and B. Javadi, Design and simulation of cache controller unit for a risc processor, 7th Annual CSI Conf. (CSICC 2001), ITRC, Tehran, Iran, 2001.
[7] K. Skadron and P. S. Ahuja, HydraScalar: A multipath-capable simulator, Newsletter of the IEEE Technical Committee on Computer Architecture, Jan. 2001.
[8] D. Burger and T. M. Austin, The simplescalar tool set, Version 2.0, Technical Report #1342, University of Wisconsin-Madison Computer Sciences Department, June 1997.
[9] R. E. Kessler, The Alpha 21264 microprocessor, IEEE Micro, pp. 24-36, Apr.1999.
[10] G. Reinman and N. Jouppi, An integrated cache timing and power model, Compaq Corp., Western Research Lab., 1999. [11] M.K. Akbari, B. Javadi, M. Shojaei and O. Aghalatifi, Design and simulation of fetch unit for a RISC processor, 7th Annual CSI Conf. (CSICC 2001), ITRC, Tehran, Iran, 2001.
[12] Standard Performance Evaluation Corp. SPEC CPU 2000 Benchmarks. http://www.specbench.org, 2000.
[13] Charles Price, MIPS IV instruction Set, revision 3.1. MIPS Technologies, Inc., M ountain View, CA, Jan. 1995. [14] M. Shojaei, Design and Simulation of cashe system for a RISC processor with multi-processor capability, MS Thesis, Computre Engineering and Information Technology Department, Amirkabir University of Technology, 2001.
[15] S. Wallace and N. Bagherzadeh, Instruction fetching mechanism for superscalar microprocessors, Euro-Par'96, Aug. 1996. [16] B. Javadi, Design and simulation of super-pipelined and super-scalar system for a RISC processor, MS Thesis, Computre Engineering and Information Technology Department, Amirkabir University of Technology, 2001.
[17] A. N. Eden and T. Mudge, The YAGS branch prediction scheme, Proc. of Micro-31, pp. 69-77, Dec. 1998.
[18] K. Skadron, M. Martonosi and D. W. Clark, Alloyed global and local branch history: a robust solution to wrong-history mispredictions, Technical Report TR-606-99, Princeton Dept. of Computer Science, 1999.
[19] K. Skadron, M. Martonosi and D. W. Clark, Alloying global and local branch history: taxonomy, performance, and analysis, Technical Report, Princeton Dept. of Computer Science, 1999.
[20] W. Burleson, M. Ciesielski, F. Klass and W. Liu, Wave-Pipelining: A tutorial and research survey, IEEE Trans. on VLSI vol.6, no. 3, pp. 464-474, September, 1998.


Volume 3, Issue 1
Winter and Spring 2010
Pages 13-25
  • Receive Date: 18 December 2008
  • Revise Date: 15 September 2009
  • Accept Date: 21 September 2009