(error in trace0.txt)
It has been brought to my attention that trace0.txt ends with #end instead of #eof as indicated in the writeup. Depending on how you are reading the files, this may or may not cause problems for your code. If you are explicitly checking for the string "#eof" at the end of the file, you may modify the file data/trace0.txt so that it has the correct final line. Grading will be done with a corrected file.
If your program works with the file as-is, you should not need to do anything, but I recommend updating the file anyway, just in case.
CS211 Fall 2021 Programming Assignment V David Menendez Due December 13, 2021 Submit by December 15, 11:59 PM This assignment will provide a better understanding of caches and provide further experience with C programming. Advice Read this document first. Take notes. In particular, read section 1.3 carefully. Start working early. Before you do any coding, make sure you can run the auto-grader, create a Tar archive, and submit that archive to Sakai. You don’t want to discover on the last day that scp doesn’t work on your personal machine. Decide your data structures and algorithms first. Writing out pseudocode is not required, but it may be a good idea. Start working early. You will almost certainly encounter problems you did not anticipate while writing this project. It is much better if you find them in the first week and can ask questions. 1 Cache Simulation A cache is a collection of cache lines, each of which may store a block of memory along with some additional information about the block (for example, which addresses it contains). Each block contains the same number of bytes, known as the block size. The block size will always be a power of two. The cache size is the block size multiplied by the number of cache lines (that is, the additional information is not counted in the cache size). Consider a system with 48-bit addresses and a block size of 16 bytes. Each block will begin with an address divisible by 16. Because 16 = 24, the last 4 bits of an address will determine its offset within a particular block. For example, the address ffff0000abcd (hexadecimal) will have offset d. The remaining 44 bits of the address may be considered a block identifier. If the block size were 8 bytes, then the last 3 bits of the address would be its block offset. The last three bits of ffff000abcd are 101 (binary) or 5 (hexadecimal). The most-significant 45 bits will be the block identifier. (Exercise: write the block identifier in hexadecimal.1) For a cache with a single cache line, that cache line will store the 16 bytes of the block (the data) along with the validity bit and block identifier (the metadata). Each memory access will first check whether the address is part of the block currently in the cache (if any). Since we are only simulating memory reads and writes, you cache will not need to store the actual blocks. 11ffe0001579 1 1.1 Memory operations Your simulator will simulate two memory operations: reading and writing individual bytes. Your program will read a trace file describing addresses to read or write from, and will keep track of which blocks are stored in which cache lines in order to determine when these memory operations result in actual reads and writes to memory. Note that your program only keeps enough information to simulate the behavior of the cache. It does not need to know the actual contents of memory and the blocks stored in the cache lines; this information is, in fact, not available. Your program will simulate a write-through, write-allocate cache. The operations supported are reading and writing from an address A. read A Simulate reading the byte at address A. If the block containing A is already loaded in the cache, this is a cache hit. Otherwise, this is a cache miss: note a memory read and load the block into a chosen cache line (see sections 1.2 and 1.4). write A Simulate writing a byte at address A. If the block containing A is already loaded in the cache, this is a cache hit: note a memory write. Otherwise, this is a cache miss: note a memory read, load the block into a chosen cache line, and then note a memory write. Your program will track the number of cache hits, cache misses, memory reads, and memory writes. Note that loading an address or block into the cache means changing the information about a particular cache line to indicate that it holds a particular block. Since your simulator does not simulate the contents of memory, no data will be loaded or stored from the address itself. 1.1.1 Prefetching Prefetching is a technique used to increase the benefit of spatial locality for a cache. In the operations described above, blocks are read from memory only after a cache miss. If a cache uses prefetching, each cache miss will ensure that the next block is also loaded into the cache. For example, a program accesses address A, which is part of block X. If this access is a cache miss (meaning X is not in the cache), the cache will load block X into the cache. A prefetching cache will also check whether block X + 1 is present in the cache, and load it into the cache if not. Note that the prefetching step is not considered a cache hit or a cache miss. Algorithm 1 illustrates how to simulate a memory read when prefetching. Prefetching occurs when loading blocks into the cache. Because cachesim simulates a write- allocate cache, a cache miss during a write will result in loading the relevant block, which may then result in prefetching the next block. Note that prefetching does not increase or decrease the number of writes. 1.2 Mapping For caches with multiple cache lines, there are several ways of determining which cache lines will store which blocks. Generally, the cache lines will be grouped into one or more sets. Whenever a block is loaded into a cache line, it will always be stored in a cache line belonging to its set. The number of sets will always be a power of two. 2 Algorithm 1 Simulating a memory read in a prefetching cache procedure read(A) X ← block ID for A if X in cache then increment cache hits else increment cache misses increment memory reads set X in cache Y ← X + 1 if Y not in cache then increment memory reads set Y in cache end if end if end procedure Let B be the block size. The least significant log2 B = b bits of an address will be its block offset. If there are S sets, then the next least significant log2 S = s bits of the address will identify its set. The remaining 48− s− b bits are known as the tag and are used to identify the block stored in a particular cache line. For example, for a cache with 16 sets and 16-byte blocks, the address ffff0000abcd is in set 12 (c). The first 40 bits, ffff000ab, are the tag. Your program will simulate three forms of cache: Direct-mapped This is the simplest form of cache, where each set contains one cache line and no decisions need to be made about where a particular block will be stored. n-way associative In this form of cache, each set contains n cache lines, where n is a power of two. The particular block stored in each line is identified by its tag. Fully associative This form of cache puts all the cache lines in a single set. For direct and n-way associative caches, your program will need to derive the number of sets (S) from the associativity A (the number of cache lines per set), the block size B, and the size of the cache C using the relation C = SAB. For fully associative caches, S = 1, but you will need to determine A using the same relation.2 1.3 Calculating block, set, and tag We will simulate a cache in a system with 48-bit addresses. Since int has 32 bits on the iLab machines, you will want to use a long or unsigned long to represent the addresses. Using some other type, such as a string containing hexadecimal digits, is not recommended for efficiency reasons. Your program will need to extract sequences of bits from the addresses it works with. When simulating a cache with block size B = 2b, the least significant b bits of an address (the block offset) 2Can direct-mapped and fully associative caches be considered special cases of n-way associativity? 3 may be ignored. Code such as X >> b will effectively remove the block offset, leaving only the block identifier. The least significant s bits of the block identifier identify the set. To obtain these, recall that 2k − 1, represented in binary, is all zeros except for the last k bits, which are ones. That is, 22 − 1 is 112, 24 − 1 is 11112, and so forth. Recall that 2k may be easily computed by 1 < k. the tag is the portion of the address remaining after removing the block offset and set identifier. note addresses are integers, which are bit vectors. extracting the set index and tag from an address should be done using the bitwise operations provided by c. your program should not use functions from math.h or attempt to manipulate addresses as strings. integer operations are fast. use them! exercise show that the following expression is true if and only if n is a power of two: x != 0 && (x & (x - 1)) == 0 1.4 replacement policies each cache line is either valid, meaning it contains a copy of a block from memory, or invalid, meaning it does not. initially, all cache lines are invalid. as cache misses occur and blocks are loaded into the cache, those lines will become valid. eventually, the cache will need to load a block into a set which has no invalid lines. to load this new block, it will select a line in the set according to its replacement policy and put the new block in that line, replacing the block that was already present.3 we consider two replacement policies in this assignment: fifo (“first-in, first-out”) in this policy, the cache line that was loaded least recently will be replaced. lru (“least-recently used”) in this policy, the cache line that was accessed least recently will be replaced.4 your simulator will implement fifo. implementing lru is left for extra credit. note that each set contains a fixed, finite number of cache lines. to determine the oldest block in the cache, it is sufficient to store the relative age of each cache line. that is, the first time a block is loaded into a cache line, its relative age is set to 0 and the ages of all other valid cache lines are increased by 1. this ensures that every cache line will have a unique age. once all cache lines are valid (i.e., contain data), a sequential search is sufficient to determine the oldest cache line.5 2 program you will write a program cachesim that reads a trace of memory accesses and simulates the behavior of a k.="" the="" tag="" is="" the="" portion="" of="" the="" address="" remaining="" after="" removing="" the="" block="" offset="" and="" set="" identifier.="" note="" addresses="" are="" integers,="" which="" are="" bit="" vectors.="" extracting="" the="" set="" index="" and="" tag="" from="" an="" address="" should="" be="" done="" using="" the="" bitwise="" operations="" provided="" by="" c.="" your="" program="" should="" not="" use="" functions="" from="" math.h="" or="" attempt="" to="" manipulate="" addresses="" as="" strings.="" integer="" operations="" are="" fast.="" use="" them!="" exercise="" show="" that="" the="" following="" expression="" is="" true="" if="" and="" only="" if="" n="" is="" a="" power="" of="" two:="" x="" !="0" &&="" (x="" &="" (x="" -="" 1))="=" 0="" 1.4="" replacement="" policies="" each="" cache="" line="" is="" either="" valid,="" meaning="" it="" contains="" a="" copy="" of="" a="" block="" from="" memory,="" or="" invalid,="" meaning="" it="" does="" not.="" initially,="" all="" cache="" lines="" are="" invalid.="" as="" cache="" misses="" occur="" and="" blocks="" are="" loaded="" into="" the="" cache,="" those="" lines="" will="" become="" valid.="" eventually,="" the="" cache="" will="" need="" to="" load="" a="" block="" into="" a="" set="" which="" has="" no="" invalid="" lines.="" to="" load="" this="" new="" block,="" it="" will="" select="" a="" line="" in="" the="" set="" according="" to="" its="" replacement="" policy="" and="" put="" the="" new="" block="" in="" that="" line,="" replacing="" the="" block="" that="" was="" already="" present.3="" we="" consider="" two="" replacement="" policies="" in="" this="" assignment:="" fifo="" (“first-in,="" first-out”)="" in="" this="" policy,="" the="" cache="" line="" that="" was="" loaded="" least="" recently="" will="" be="" replaced.="" lru="" (“least-recently="" used”)="" in="" this="" policy,="" the="" cache="" line="" that="" was="" accessed="" least="" recently="" will="" be="" replaced.4="" your="" simulator="" will="" implement="" fifo.="" implementing="" lru="" is="" left="" for="" extra="" credit.="" note="" that="" each="" set="" contains="" a="" fixed,="" finite="" number="" of="" cache="" lines.="" to="" determine="" the="" oldest="" block="" in="" the="" cache,="" it="" is="" sufficient="" to="" store="" the="" relative="" age="" of="" each="" cache="" line.="" that="" is,="" the="" first="" time="" a="" block="" is="" loaded="" into="" a="" cache="" line,="" its="" relative="" age="" is="" set="" to="" 0="" and="" the="" ages="" of="" all="" other="" valid="" cache="" lines="" are="" increased="" by="" 1.="" this="" ensures="" that="" every="" cache="" line="" will="" have="" a="" unique="" age.="" once="" all="" cache="" lines="" are="" valid="" (i.e.,="" contain="" data),="" a="" sequential="" search="" is="" sufficient="" to="" determine="" the="" oldest="" cache="" line.5="" 2="" program="" you="" will="" write="" a="" program="" cachesim="" that="" reads="" a="" trace="" of="" memory="" accesses="" and="" simulates="" the="" behavior="" of=""> k. the tag is the portion of the address remaining after removing the block offset and set identifier. note addresses are integers, which are bit vectors. extracting the set index and tag from an address should be done using the bitwise operations provided by c. your program should not use functions from math.h or attempt to manipulate addresses as strings. integer operations are fast. use them! exercise show that the following expression is true if and only if n is a power of two: x != 0 && (x & (x - 1)) == 0 1.4 replacement policies each cache line is either valid, meaning it contains a copy of a block from memory, or invalid, meaning it does not. initially, all cache lines are invalid. as cache misses occur and blocks are loaded into the cache, those lines will become valid. eventually, the cache will need to load a block into a set which has no invalid lines. to load this new block, it will select a line in the set according to its replacement policy and put the new block in that line, replacing the block that was already present.3 we consider two replacement policies in this assignment: fifo (“first-in, first-out”) in this policy, the cache line that was loaded least recently will be replaced. lru (“least-recently used”) in this policy, the cache line that was accessed least recently will be replaced.4 your simulator will implement fifo. implementing lru is left for extra credit. note that each set contains a fixed, finite number of cache lines. to determine the oldest block in the cache, it is sufficient to store the relative age of each cache line. that is, the first time a block is loaded into a cache line, its relative age is set to 0 and the ages of all other valid cache lines are increased by 1. this ensures that every cache line will have a unique age. once all cache lines are valid (i.e., contain data), a sequential search is sufficient to determine the oldest cache line.5 2 program you will write a program cachesim that reads a trace of memory accesses and simulates the behavior of a>