Questions 1 -12 are worth 4 points each, 13-14 are worth 6 points each,
15-16 15 points each, 17 is worth 10 points
The three numbered schematic diagrams show possible implementations of a simplified MIPS CPU. Describe the particular type and its characteristics, as numbered, for questions 1, 2, 3.
1. Describe the implementation of the CPU shown in diagram 1. What are the characteristics of this implementation?
2. Describe the implementation of the CPU shown in diagram 2. What are the characteristics of this implementation?
3. Describe the implementation of the CPU shown in diagram 3. What are the characteristics of this implementation?
4. What changes would need to be made to the pipeline CPU to reduce the branch hazard penalty from three clock cycles to one? Describe with reference to the appropriate diagram, or sketch the changes.
5. What are a cache "hit" and a cache "miss"? How are they detected?
6. How does a set-associative cache differ from a direct-mapped cache? Why are there usually fewer cache misses with a set-associative cache than with a direct-mapped cache if both are the same size?
7. What are two reasons for using virtual memory?
8. How is the logical (virtual) address translated into the physical address in a virtual memory system? How does the TLB (Table Look-aside Buffer) speedup this process?
9. What is a SIMD parallel processor?
10. How do the processors in a MIMD distributed memory parallel computer communicate?
11. What is an "Instruction Set Architecture"?
12. What is meant by "base" or "displacement" addressing, also called "register indirect with base or displacement" ? Give an example of a MIPS instruction using this addressing mode. Give an example of a data structure which might be accessed conveniently using this addressing mode.
13. Identify the circuit (given on a separate sheet), and explain what it does. Where might this circuit be used in a CPU?
14. Identify the circuit (given on a separate sheet), and explain what it does. Where might this circuit be used in a CPU?
15. Assume that we have the instruction mix shown below.
Instruction condition frequency
Branch/jump taken 15%
Branch/jump not taken 15%
store word 15%
load word no dependency 10%
load word data dependency 5%
Consider the pipeline and multicycle architectures studied in class. Assume that for the pipeline architecture, a branch not taken incurs no penalty but a branch taken incurs a three cycle stall, and that data forwarding resolves all data hazards, except when the instruction following a load word depends on the data loaded, in which case a one cycle stall occurs. For (a) and (b) assume that memory accesses take one cycle.
a) What is the average CPI (cycles per instruction) for the multicycle CPU?
b) What is the average CPI for the pipeline CPU?
Now suppose that we include cache miss penalties into the calculation. Assume that both instruction cache and data cache misses occur on 2% of the memory accesses (reads and writes) and that a cache miss incurs a penalty of 10 clock cycles. (Note: every instruction is a potential instruction cache miss, but only sw and lw instructions can cause a data cache miss.)
c) What is the average CPI for the multicycle CPU, taking into account cache misses?
d) What is the average CPI for the pipeline CPU, taking into account cache misses?
16. Consider the following code segment.
addi $s0, $s0, 1
add $s2, $s2, $s3
lw $s4, list($s0)
add $s4, $s4, $s2
sw $s4, list($s0)
slt $t0, $s0, $s5
bne $t0, $0, loop
a) For the pipeline architecture, identify the data hazards that occur in this code. If data hazards are resolved by stalls (no data forwarding), how many additional cycles do the data hazards add to the execution time?
b) For the pipeline architecture, identify the branch hazards that occur in this code. If branch hazards are resolved by stalls, how many additional cycles do the branch hazards add to the execution time? What is the total time for one iteration of this loop under the conditions of parts (a) and (b), not including pipeline fill time?
c) Suppose that data hazards are resolved by forwarding, except that a data dependency for an instruction immediately following an lw instruction causes a one cycle stall. Assume that branch hazards cause stalls. How many cycles will one iteration of this loop take.?
d) Under the assumptions of part (c), show how this code can be modified so that the logic remains the same but the number of cycles needed is reduced? How many cycles are now needed for the loop?
e) Assume that the branch hazard is reduced by one cycle by requiring that the instruction after a branch is always executed, whether or not the branch is taken, and the compiler satisfies that requirement for the above code by inserting a nop (no operation, wasted cycle) after the branch instruction. Show how the code could be modified so that the logic remains the same, but the number of cycles needed is reduced. How many cycles are needed for the loop if this and the part (d) optimization are made?
17. Assume that you are designing a processor that will be used in a MIMD shared memory machine with processors and memory connected on a bus.
a) What issues must you be aware of when designing a cache for this architecture?
b) Suppose you decide on a cache design where any line (block) in the cache can be in any one of three states, invalid, read-only (shared), or read/write (exclusive). Assume further, that the cache for each processor has circuitry which enables it to detect messages on the bus initiated by other processors ("bus snooping"). Describe how the cache will change from one state to another according to actions by the processor itself and according to actions by other processors. You may describe this in words or using a finite state machine diagram.