Complete all problems and show work.5.1 In this exercise we look at memory locality properties of...

Question

Complete all problems and show work.5.1 In this exercise we look at memory locality properties of matrix computation. The following code is written in C, where elements within the same row are stored contiguously. Assume each word is a 32-bit integer. for (I = 0; I  How many 32-bit integers can be stored in a 16-byte cache block? 5.1.2 [5]  References to which variables exhibit temporal locality? 5.1.3 [5]  References to which variables exhibit spatial locality? Locality is affected by both the reference order and data layout. The same computation can also be written below in Matlab, which differs from C by storing matrix elements within the same column contiguously in memory. for I = 1:8    for J = 1:8000       A(I,J) = B(I,0) + A(J,I);    end end 5.1.4 [5]  How many 16-byte cache blocks are needed to store all 32-bit matrix elements being referenced? 5.2.1 [10]  For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty. 5.2.2 [10]  For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with two-word blocks and a total size of 8 blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty. 5.2.3 [20]  You are asked to optimize a cache design for the given references. There are three direct-mapped cache designs possible, all with a total of 8 words of data: C1 has 1-word blocks, C2 has 2-word blocks, and C3 has 4-word blocks. In terms of miss rate, which cache design is the best? If the miss stall time is 25 cycles, and C1 has an access time of 2 cycles, C2 takes 3 cycles, and C3 takes 5 cycles, which is the best cache design? There are many different design parameters that are important to a cache's overall performance. Below are listed parameters for different direct-mapped cache designs. Cache Data Size: 32 KiB Cache Block Size: 2 words Cache Access Time: 1 cycle 5.2.4 [15]  Calculate the total number of bits required for the cache listed above, assuming a 32-bit address. Given that total size, find the total size of the closest direct-mapped cache with 16-word blocks of equal size or greater. Explain why the second cache, despite its larger data size, might provide slower performance than the first cache. 5.2.5 [20]  Generate a series of read requests that have a lower miss rate on a 2 KiB 2-way set associative cache than the cache listed above. Identify one possible solution that would make the cache listed have an equal or lower miss rate than the 2 KiB cache. Discuss the advantages and disadvantages of such a solution. 5.2.6 [15]  The formula (Block address) modulo (Number of blocks in the cache) shows the typical method to index a direct-mapped cache. Assuming a 32-bit address and 1024 blocks in the cache, consider a different indexing function, specifically (Block address[31:27] XOR Block address[26:22]). Is it possible to use this to index a direct-mapped cache? If so, explain why and discuss any changes that might need to be made to the cache. If it is not possible, explain why. 5.3 For a direct-mapped cache design with a 32-bit address, the following bits of the address are used to access the cache. 5.3.1 [5]  What is the cache block size (in words)? 5.3.2 [5]  How many entries does the cache have? 5.3.3 [5]  What is the ratio between total bits required for such a cache implementation over the data storage bits? Starting from power on, the following byte-addressed cache references are recorded. 5.3.4 [10]  How many blocks are replaced? 5.3.5 [10]  What is the hit ratio? 5.3.6 [10]  List the final state of the cache, with each valid entry represented as a record of .

Pulkit · Accepted Answer

5.1 In this exercise we look at memory locality properties of matrix computation. The following code is written in C, where elements within the same row are stored contiguously. Assume each word is a 32-bit integer.
for (I = 0; I  How many 32-bit integers can be stored in a 16-byte cache block?
5.1.1 32 bit is 4 bytes.1
Therefor 16 Bytes can store 4x4 bytes=4x32 bit words
i.e., 4 words of 32-bit size can be stores in a cache block.
5.1.2 [5]  References to which variables exhibit temporal locality?
5.1.2 Temporal locality defines that the data recently accessed has a higher probability of being accesses again soon. This particular block of data will be accesses again and again and will be stored in the cache for a long time.
If we observe the following line of code,
A[I][J] = B[I][0] + A[J][I], where the value of I is 0 for a greater number of iterations and at the same time, B[I][0] also will have the same number of iterations.
So, variable I and matrix B[I][0] exhibit temporal locality.
5.1.3 [5]  References to which variables exhibit spatial locality?
5.1.3 Spatial locality defines that if a data is referred then there is high chances of referring data from the nearby memory locations.
In the above code segment, A[I][J] = B[I][0] + A[J][I], since the same rows are stored in the nearby locations, A[I][J] will be accessed like A[0][0], A[0][1], A[0][2], A[0][3], A[0][4] and so it will exhibit spatial locality.
Locality is affected by both the reference order and data layout. The same computation can also be written below in Matlab, which differs from C by storing matrix elements within the same column contiguously in memory.
for I = 1:8
   for J = 1:8000
      A(I,J) = B(I,0) + A(J,I);
   end
end
5.1.4 [5]  How many 16-byte cache blocks are needed to store all 32-bit matrix elements being referenced?
5.1.4 We need to catch 8 elements of B as we are referencing B[I][0] for 8 values of I.
We are referencing A[J][I] for 0 For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with 16 one-word blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty
5.2.1 Consider the table below. Direct Mapped Cache, 16 1-Word Blocks
	Address
	Tag
	Index
	
	Decimal
	Binary (low byte) 
	Binary (low 4 bits)
	Decimal
	Binary
	Decimal
	Hit/Miss
	3
	00000011
	0000
	0
	0011
	3
	Miss
	180
	10110100
	1011
	11
	0100
	4
	Miss
	43
	00101011
	0010
	2
	1011
	11
	Miss
	2
	00000010
	0000
	0
	0010
	2
	Miss
	191
	10111111
	1011
	11
	1111
	15
	Miss
	88
	01011000
	0101
	5
	1000
	8
	Miss
	190
	10111110
	1011
	11
	1110
	14
	Miss
	14
	00001110
	0000
	0
	1110
	14
	Miss
	181
	10110101
	1011
	11
	0101
	5
	Miss
	44
	00101100
	0010
	2
	1100
	12
	Miss
	186
	10111010
	1011
	11
	1010
	10
	Miss
	253
	11111101
	1111
	15
	1101
	13
	Miss
5.2.2 [10]  For each of these references, identify the binary address, the tag, and the index given a direct-mapped cache with two-word blocks and a total size of 8 blocks. Also list if each reference is a hit or a miss, assuming the cache is initially empty.
5.2.2 Consider the table below.

5.1 In this exercise we look at memory locality properties of matrix computation. The following code is written in C, where elements within the same row are stored contiguously. Assume each word is a...

Answer To: 5.1 In this exercise we look at memory locality properties of matrix computation. The following code...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment