[20/15/20/20]The compiled Linpack performance of the Cray-1 (designed in 1976) was almost doubled by a better compiler in 1989. Let’s look at a simple example of how this might occur. Consider the...

[20/15/20/20]The compiled Linpack performance of the Cray-1

(designed in 1976) was almost doubled by a better compiler in 1989.


Let’s look at a simple example of how this might occur. Consider


the DAXPY-like loop (where k is a parameter to the procedure containing


the loop):


do 10 i=1,64


do 10 j=1,64


Y(k,j)=a*X(i,j)+Y(k,j)


10 continue


a. [20]Write the straightforward code sequence for just the inner


loop in VMIPS vector instructions.


b. [15]Using the techniques of Section G.6, estimate the performance


of this code on VMIPS by finding T64 in clock cycles. You may assume


that Tloop of overhead is incurred for each iteration of the outer loop. What limits


the performance?


c. [20]Rewrite the VMIPS code to reduce the performance limitation;


show the resulting inner loop in VMIPS vector instructions. (Hint: Think


about what establishes Tchime; can you affect it?) Find the total time for the


resulting sequence.


d. [20]Estimate the performance of your new version, using the


techniques of Section G.6 and finding T64.




May 18, 2022
SOLUTION.PDF

Get Answer To This Question

Submit New Assignment

Copy and Paste Your Assignment Here