[20/15/20/20]The compiled Linpack performance of the Cray-1 (designed in 1976) was almost doubled by a better compiler in 1989. Let’s look at a simple example of how this might occur. Consider the...

[20/15/20/20]The compiled Linpack performance of the Cray-1

(designed in 1976) was almost doubled by a better compiler in 1989.

Let’s look at a simple example of how this might occur. Consider

the DAXPY-like loop (where k is a parameter to the procedure containing

the loop):

do 10 i=1,64

do 10 j=1,64

Y(k,j)=a*X(i,j)+Y(k,j)

10 continue

a. [20]Write the straightforward code sequence for just the inner

loop in VMIPS vector instructions.

b. [15]Using the techniques of Section G.6, estimate the performance

of this code on VMIPS by finding T64 in clock cycles. You may assume

that Tloop of overhead is incurred for each iteration of the outer loop. What limits

the performance?

c. [20]Rewrite the VMIPS code to reduce the performance limitation;

show the resulting inner loop in VMIPS vector instructions. (Hint: Think

about what establishes Tchime; can you affect it?) Find the total time for the

resulting sequence.

d. [20]Estimate the performance of your new version, using the

techniques of Section G.6 and finding T64.

May 18, 2022

Get Answer To This Question