1) Convert the following pseudo code to C/C++ (vector triad benchmark from chapter 1 of the book) Run this code on the ACI-ICS cluster. Note* this should be run on a single core, you should run it on both the aci-i and aci-b nodes.
double precision, dimension(N) :: A,B,C,D
double precision :: S,E,MFLOPS
do i=1,N // fill arrays with data
A(i) = 0.d0;B(i) = 1.d0
C(i) = 2.d0; D(i) = 3.d0
enddo
call get_walltime(S) // get start time
do j=1,R
do i=1,N
A(i) = B(i) + C(i) * D(i) // 3 loads, 1 store
enddo
if(A(2).lt.0) call dummy(A,B,C,D) // prevent loop interchange
enddo
call get_walltime(E) // get end time stamp
MFLOPS = R*N*2.d0/((E-S)*1.d6) // calculate MFLOPS
Use the following routine for get_wall_time()
#include
void get_walltime(double* wcTime) {
struct timeval tp;
gettimeofday(&tp, NULL);
*wcTime = (double)(tp.tv_sec + tp.tv_usec/1000000.0);
}
2) Explore appropriate values for R and N. Observe the results. Fix R to some value and generate a plot for N that best summarizes your observations. (MFPLOS vs. N).
*** R does not have to be fixed!!!! **** MFLOPS calculation takes care of this!
3) Your homework submission should be a zip file containing two files: a pdf write-up and a zip of your code. The short write-up should be 1-2 pages(including a graph) discussing the findings of your benchmarks. Your discussion should tie the benchmark results with the hardware under test. Your source code should include a Makefile.
Expectations:
Your code should use dynamic allocation for the arrays under test. No credit will be given if the code does not compile. Points will be lost if the code does not accurately represent the equivalent pseudo code above (actually fortran). This is a high performance computing course, the range of values for R and N should reflect that (Big Data!). None of the tests have to run for more than a few minutes.
Your write-up will be evaluated for quality of discussion and the accuracy of the graph.