Answer To: 5.1 In this exercise we look at memory locality properties of matrix computation. The following code...
Pulkit answered on Aug 19 2021
5.1 In this exercise we look at memory locality properties of matrix computation. The following code is written in C, where elements within the same row are stored contiguously. Assume each word is a 32-bit integer.
for (I = 0; I < 8; I++)
for (J = 0; J < 8000; J++)
A[I][J] = B[I][0] + A[J][I];
5.1.1 [5] §5.1> How many 32-bit integers can be stored in a 16-byte cache block?
5.1.1 32 bit is 4 bytes.1
Therefor 16 Bytes can store 4x4 bytes=4x32 bit words
i.e., 4 words of 32-bit size can be stores in a cache block.
5.1.2 [5] References to which variables exhibit temporal locality?
5.1.2 Temporal locality defines that the data recently accessed has a higher probability of being accesses again soon. This particular block of data will be accesses again and again and will be stored in the cache for a long time.
If we observe the following line of code,
A[I][J] = B[I][0] + A[J][I], where the value of I is 0 for a greater number of iterations and at the same time, B[I][0] also will have the same number of iterations.
So, variable I and matrix B[I][0] exhibit temporal locality.
5.1.3 [5] References to which variables exhibit spatial locality?
5.1.3 Spatial locality defines that if a data is referred then there is high chances of referring data from the nearby memory locations.
In the above code segment, A[I][J] = B[I][0] + A[J][I], since the same rows are stored in the nearby locations, A[I][J] will be accessed like A[0][0], A[0][1], A[0][2], A[0][3], A[0][4] and so it will exhibit spatial locality.
Locality is affected by both the reference order and data layout. The same computation can also be written below in Matlab, which differs from C by storing matrix elements within the same column contiguously in memory.
for I = 1:8
for J = 1:8000
A(I,J) = B(I,0) + A(J,I);
end
end
5.1.4 [5] How many 16-byte cache blocks are needed to store all 32-bit matrix elements being referenced?
5.1.4 We need to catch 8 elements of B as we are referencing B[I][0] for 8 values of I.
We are referencing A[J][I] for 0<=I<8 and <=J<8000,
Therefore, we are using reference=8000*8 times=64000 times
In the same way for A[I][J] i.e., 64000 times
We have overlapped in the last two cases.
1 case, we reference A[0][0], A[0][1], A[0][2]........, A[0][7999], A[1][0], A[1][1],....., A[7][7999].
2 case, we reference A[0][0], A[1][0], A[2][0],........, A[7999][0], A[0][1], A[1][1], .....A[7999][7]
Therefore, there is overlapping for the elements A[k][l] where 0<=k, l<8. there are 8*8=64 elements like A[k][l].
No. of elements of A needed to be cached=64000+6400-64=127936
Total we need to cache=127936+8=127944 matrix elements
We can store 4 matrix elements inside of a single block
Therefore, to store 127944 matrix elements, we need=127944/4=31986 blocks
5.2.1 [10] For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty
5.2.1 Consider the table below. Direct Mapped Cache, 16 1-Word Blocks
Address
Tag
Index
Decimal
Binary (low byte)
Binary (low 4...