[20/15/20/20]The compiled Linpack performance of the Cray-1
(designed in 1976) was almost doubled by a better compiler in 1989.
Let’s look at a simple example of how this might occur. Consider
the DAXPY-like loop (where k is a parameter to the procedure containing
the loop):
do 10 i=1,64
do 10 j=1,64
Y(k,j)=a*X(i,j)+Y(k,j)
10 continue
a. [20]Write the straightforward code sequence for just the inner
loop in VMIPS vector instructions.
b. [15]Using the techniques of Section G.6, estimate the performance
of this code on VMIPS by finding T64 in clock cycles. You may assume
that Tloop of overhead is incurred for each iteration of the outer loop. What limits
the performance?
c. [20]Rewrite the VMIPS code to reduce the performance limitation;
show the resulting inner loop in VMIPS vector instructions. (Hint: Think
about what establishes Tchime; can you affect it?) Find the total time for the
resulting sequence.
d. [20]Estimate the performance of your new version, using the
techniques of Section G.6 and finding T64.