Computer Organization

Lab: Optimizing CPU Performance for pipelines

Purpose
Learn to avoid branch and data hazards by reordering instructions in a program. Learn to reorganize code to optimize pipeline and dual pipeline performance.
Method
Modify assembly programs to avoid branch and data hazards. Modify programs to get optimal performance for a single and for dual pipelines.
Preparation
Read chapter 4 in the text.
Files to Use
 
ArraySum.s
What to Hand In
Zip up your entire lab directory, including:
ArraySumP1.s -- no reorder, nops added for all data and branch hazards (assume accelerated branch and data forwarding).
                                      Identify all points at which data forwarding is used by inserting a line to indicate this, e.g. "data forwarded," after any
                              instruction from which data is forwarded to one of the two instructions after it.
ArraySumP2.s -- reordered to optimize assuming accelerated branch and data forwarding
ArraySumP3.s -- use loop unrolling to optimize assuming accelerated branch and data forwarding
 
ArraySumD1.s -- reordered (no unrolling) to optimize assuming accelerated branch and data forwarding, using dual piepline
ArraySumD2.s -- use loop unrolling to optimize assuming accelerated branch and data forwarding, using dual piepline

A text file summarizing the times for the different programs for base loop doing 1000 iterations, including for the singles cycle CPU.
 

Steps

  1. Calculate the time for ArraySum.s to execute for a single cycle CPU, assuming 1000 iterations. (cycle time is 800 ps)
  2. Insert the needed nops into ArraySum.s assuming accelerated branch and full data forwarding for the five-stage pipeline. Note that there will always be a nop after any branch, since the pipeline will always stall one cycle for any branch taken. Calculate the time, assuming 1000 iterations (cycle time 200 ps). (Save as ArraySumP1.s)
  3. Modify the ArraySum.s code to optimize assuming accelerated branch and data forwarding. (Save as ArraySumP2.s) Calculate the time, assuming 1000 iterations. Note that any branch taken will incur a one cycle stall, but not if not taken -- you can indicate this by putting a line "stall if taken" after a conditional branch and a nop after an unconditional branch.
  4. Unroll the loop four iterations Use the same assumptions as for step 3. Rearrange code to minimize the need for nops, assuming data forwarding. Calculate the time, assuming 1000 iterations.(Save as ArraySumP3.s)
  5. Assume that you have a dual pipeline architecture. One pipeline only does load and stores, the other does all other instructions. Accelerated branch and full data forwarding (even across pipelines). Give the best organization of the code with no loop unrolling. Calculate the time, assuming 1000 iterations.(Save as ArraySumD1.s)
  6. Use loop unrolling and register renaming to optimize the code for the dual pipeline described in step 6. Calculate the time, assuming 1000 iterations.(Save as ArraySumD2.s)