1. Loop Unrolling [46 marks]
Consider the following loop:
loop: l.d f4,0(r1) l1
l.d f6,0(r2) l2
mul.d f4,f4,f0 m1
mul.d f6,f6,f2 m2
add.d f4,f4,f6 a1
s.d f4,0(r1) s1
daddui r1,r1,#-8 sub1
daddui r2,r2,#-8 sub2
bnez r1,loop br
Note: Our default is that FP arithmetics have 4 x-boxes.
a) [10 marks] Using the names 'l1' to 's1' for the first six instructions
in the loop body, draw the flow-dependence graph for these instructions.
Label each arrow with the dependence gap between the producer and the
consumer.
In what follows, focus on three flow-dependence types: i) FP arith to
FP arith, ii) FP arith to FP store, and iii) FP load to FP arith. Denote
the number of m-boxes in memory references by '#m', and the number of
x-boxes in FP arithmetics by '#x'.
b) [6 marks] For each of the three designated flow-dependence types,
indicate the number of stalls in adjacent producer-consumer pairs as
functions of '#m" and '#x'.
c) [10 marks] Suppose #m = 1 and #x = 4. How many stalls occur in one
iteration of the loop if it is executed exactly as written?
d) [10 marks] Unroll the loop twice. If one reschedules the unrolled
loop optimally, how many stalls are left? (Keep the branch as the last
instruction. Show the rescheduled code using the _short_ names).
e) [10 marks] Increase the 'mul-add dependence gap' to 5 cycles, leaving
everything else unchanged. Unroll the loop three times. If one reschedules
the unrolled loop optimally, how many stalls are left? (Keep the branch as
the last instruction. Show the rescheduled code using the _short_ names).
2. Dynamic Instruction Scheduling I [30 marks]
Imagine that reservation stations only track whether floating-point operands
are valid, and that integer operands appear by magic whenever needed.
a) [10 marks] Dispatch instructions 'l1' to 's1' to reservation stations
rs1(l1) to rs6(s1). Show the contents of each reservation station.
Indicate both valid ("value") and pending ("ear") operands in each station.
For the loads, you may make all operand entries 'blank'. That is, mark all
load dependences as resolved. For the store, just invent a regular
single-operand reservation station. The value of 'test' is either "may
issue" or "may not issue". The value of 'free' is either "free" or "not
free".
b) [10 marks] After both loads have completed, but no further action has
occurred, show the contents of each reservation station. As before, you
may mark some reservation stations as 'free'. Syntax: result[rs9(a9)];
val[f12].
c) [10 marks] At this point, dispatch the two loads of the second iteration,
viz., 'l3' and 'l4', into reservation stations 'rs1' and 'rs2'. Moreover,
let instruction 'm1' of the first iteration complete. Show the contents of
each reservation station.
3. Dynamic Instruction Scheduling II [24 marks]
a) [8 marks] Instruction 'op' has been dispatched to reservation station
alpha. What statement must be proved to show that all its flow dependences
are respected?
b) [8 marks] Instructions 'op1' and 'op2' are an antidependent pair. The
earlier 'op1' is dispatched to reservation station alpha. The later 'op2'
is dispatched to reservation station beta. What statement must be proved
to show that the antidependence is respected?
c) [8 marks] Instructions 'op1' and 'op2' are an output-dependent pair. The
earlier 'op1' is dispatched to reservation station alpha. The later 'op2'
is dispatched to reservation station beta. What statement must be proved to
show that the output dependence is respected?