[20/22/22]In this exercise, we will look at how a common vector loop runs on statically and dynamically scheduled versions of the RISC V pipeline. The loop is the so-called DAXPY loop (discussed...

[20/22/22]In this exercise, we will look at how a common vector

loop runs on statically and dynamically scheduled versions of the RISC V pipeline.


The loop is the so-called DAXPY loop (discussed extensively in Appendix


G) and the central operation in Gaussian elimination. The loop implements the


vector operation Y=a*X+Y for a vector of length 100. Here is the MIPS code


for the loop:


foo: fld f2, 0(x1) ; load X(i)


fmul.d f4, f2, f0 ; multiply a*X(i)


fld f6, 0(x2) ; load Y(i)


fadd.d f6, f4, f6 ; add a*X(i) + Y(i)


fsd 0(x2), f6 ; store Y(i)


addi x1, x1, 8 ; increment X index


addi x2, x2, 8 ; increment Y index


sltiu x3, x1, done ; test if done


bnez x3, foo ; loop if not done


For parts (a) to (c), assume that integer operations issue and complete in 1 clock


cycle (including loads) and that their results are fully bypassed. You will use the FP


latencies (only) shown in Figure C.29, but assume that the FP unit is fully pipelined.


For scoreboards below, assume that an instruction waiting for a result from


another function unit can pass through read operands at the same time the result is


written. Also assume that an instruction in WB completing will allow a currently


active instruction that is waiting on the same functional unit to issue in the same


clock cycle in which the first instruction completes WB.


                 a. [20]For this problem, use the RISC V pipeline of Section C.5 with the


pipeline latencies from Figure C.29, but a fully pipelined FP unit, so the initiation


interval is 1. Draw a timing diagram, similar to Figure C.32, showing the


timing of each instruction's execution. How many clock cycles does each loop


iteration take, counting from when the first instruction enters the WB stage to


when the last instruction enters the WB stage?


                b. [20]Perform static instruction reordering to reorder the instructions to


minimize the stalls for this loop, renaming registers where necessary. Use all the


same assumptions as in (a). Draw a timing diagram, similar to Figure C.32,


showing the timing of each instruction's execution. How many clock cycles


does each loop iteration take, counting from when the first instruction enters


the WB stage to when the last instruction enters the WB stage?


               c. [20]Using the original code above, consider how the instructions


would have executed using score boarding, a form of dynamic scheduling.


Draw a timing diagram, similar to Figure C.32, showing the timing of


the instructions through stages IF, IS (issue), RO (read operands), EX (execution),


and WR (write result). How many clock cycles does each loop iteration


take, counting from when the first instruction enters the WB stage to


when the last instruction enters the WB stage?


May 18, 2022
SOLUTION.PDF

Get Answer To This Question

Submit New Assignment

Copy and Paste Your Assignment Here