Computer Organization

Lab: Cache Performance

Purpose
Learn how caches affect program performance.
Method
Work in teams of two (or individually) to analyze the effects of different cache organizations using two instruction streams.
Preparation
Read chapter 5 in the textbook.
Files to Use
Instruction sequences attached
What to Hand In
Each team must turn in a written report with diagrams.

The attached diagrams (PDF, Word) show three types of cache: (1) a direct mapped cache with one word per line, (2) a direct mapped cache with four words per line and (3) a two-way set associative cache with four words per line. The code segments below give two possible instruction streams, each with a variable number of instructions. We will analyze how each of these caches is used with each instruction stream as described below, in order to determine how the caches affect performance of the code.

The caches are all the same size, 128 Bytes or 32 Words. Since we consider only instruction caches, the least significant two bits of each address are always zero and are not used in addressing individual bytes within the words (instructions) in the cache, as they might be in a data cache (which allowing loading of individual bytes in a word).

The caches have the following additional characteristics:


Problem 1: Comparing Direct Mapped Caches

Consider cache (1) which is direct mapped and contains 32 lines of one word each. Five bits (bits 2 - 6) of the instruction address are used to index the cache line. The remaining 25 bits are stored in the tag field of the cache.

  1. Assume that code segment 1 has k = 32 instructions in the loop (the last three instructions in the loop would then have addresses given by:
    1001 1000 1111 0000 0000 1100 1101 1000 <instruction 30>
    1001 1000 1111 0000 0000 1100 1101 1100 addi $s5, $s5, 1
    1001 1000 1111 0000 0000 1100 1110 0000 bne $s5, $s6, loop

    (Assume that instructions are executed in order, including branches and that there are no delays -- this lab is about cache effects, not delay issues.) Use the diagram of the direct mapped one-word per line cache to show the contents of the cache and the and the values of the tag fields after the first iteration of the loop. Calculate the time (in clock cycles) for the loop to complete 1,000,001 iterations. You may ignore the instructions preceeding the loop in your calculation. (Note: do not forget the compulsory cache misses the first time around the loop.)

  2. Consider what happens when one instruction is added to the body of the loop for code segment 1 so that k = 33. Calculate the time (in clock cycles) for the loop to complete 1,000,001 iterations. As the size of the loop is increased one instruction at a time, how does the execution time for the loop increase?
  3. Do part (1), but using cache (2), direct mapped with four words per line. Remember, on a cache miss the whole cache line is replaced!
  4. Do part (2), but using cache (2).

Problem 2: Comparing a Direct Mapped Cache to a Set Associative Cache

Code segment 2 is used for this problem and contains a loop and a subroutine call.

  1. Use cache (2), direct mapped with four words per line, and show the contents of the cache after the first iteration of the loop. Calculate the time (in clock cycles) for the loop to complete 1,000,001 iterations.
  2. Use cache (3), the two-way set associative cache, and show the contents of the cache after the first iteration of the loop. Calculate the time (in clock cycles) for the loop to complete 1,000,001 iterations.
  3. Recall that the two-way set associative cache used for part (2) needs a clock cycle which is 10% longer than the clock cycle for the direct mapped cache. Taking this into consideration, how much faster or slower is this code with cache (3), set associative, than with cache (2), direct mapped?

Code Segment 1

Address Instruction Comment
1001 1000 1111 0000 0000 1100 0101 1100 addi $s6, $0, 1000001 # initialize number of iterations
1001 1000 1111 0000 0000 1100 0110 0000 add $s5, $0, $0 # initialize loop counter
loop:
1001 1000 1111 0000 0000 1100 0110 0100 <instruction 1> # beginning of loop body,
1001 1000 1111 0000 0000 1100 0110 1000 <instruction 2> # which has a total
1001 1000 1111 0000 0000 1100 0110 1100 <instruction 3> # of k instructions
... ...
1001 1000 1111 0000 0000 1100 1101 1100 addi $s5, $s5, 1 # instruction k-1
1001 1000 1111 0000 0000 1100 1110 0000 bne $s5, $s6, loop # instruction k
# end of loop

Code Segment 2

Address Instruction Comment
1001 1000 1111 0000 0000 1100 0101 0100 add $s4, $0, $0 # initialize total
1001 1000 1111 0000 0000 1100 0101 1000 addi $s6, $0, 1000001 # initialize number of iterations
1001 1000 1111 0000 0000 1100 0101 1100 add $s5, $0, $0 # initialize loop counter
loop:
1001 1000 1111 0000 0000 1100 0110 0000 add $a0, $s0, $0 # first parameter
1001 1000 1111 0000 0000 1100 0110 0100 add $a1, $s1, $0 # second parameter
1001 1000 1111 0000 0000 1100 0110 1000 add $a2, $s2, $0 # third parameter
1001 1000 1111 0000 0000 1100 0110 1100 add $a3, $s3, $0 # fourth parameter
1001 1000 1111 0000 0000 1100 0111 0000 jal function # function call
1001 1000 1111 0000 0000 1100 0111 0100 add $s4, $s4, $v0 # add result to total
1001 1000 1111 0000 0000 1100 0111 1000 <instruction 7> # remainder of loop
1001 1000 1111 0000 0000 1100 0111 1100 <instruction 8> # which has a total
1001 1000 1111 0000 0000 1100 1000 0000 <instruction 9> # of 16 instructions
... ...
1001 1000 1111 0000 0000 1100 1001 0100 <instruction 14>
1001 1000 1111 0000 0000 1100 1001 1000 addi $s5, $s5, 1 # instruction 15
1001 1000 1111 0000 0000 1100 1001 1100 bne $s5, $s6, loop # instruction 16
# end of loop
... ...
function:
1001 1000 1111 0000 0000 1111 0110 1000 addi $sp, $sp, -16 # save state
1001 1000 1111 0000 0000 1111 0110 1100 sw $s0, 0($sp) # instruction B
1001 1000 1111 0000 0000 1111 0111 0000 sw $s1, 4($sp) # instruction C
1001 1000 1111 0000 0000 1111 0111 0100 sw $s2, 8($sp) # instruction D
1001 1000 1111 0000 0000 1111 0111 1000 sw $ra, 12($sp) # instruction E
1001 1000 1111 0000 0000 1111 0111 1100 <instruction F> #body of subroutine
1001 1000 1111 0000 0000 1111 1000 0000 <instruction G>
1001 1000 1111 0000 0000 1111 1000 0100 <instruction H>
1001 1000 1111 0000 0000 1111 1000 1000 <instruction I>
1001 1000 1111 0000 0000 1111 1000 1100 lw $s0, 0($sp) # restore state
1001 1000 1111 0000 0000 1111 1001 0000 lw $s1, 4($sp) # instruction K
1001 1000 1111 0000 0000 1111 1001 0100 lw $s2, 8($sp) # instruction L
1001 1000 1111 0000 0000 1111 1001 1000 lw $ra, 12($sp) # instruction M
1001 1000 1111 0000 0000 1111 1001 1100 addi $sp, $sp, 16 # instruction N
1001 1000 1111 0000 0000 1111 1010 0000 jr $ra #return