dd
10/24/2020 CS 3113: Project 3 https://cs.ou.edu/~fagg/classes/cs3113_f20/projects/project3.html 1/23 CS 3113: Project 3 File System Implementation (Directory Structures) Introduction Hard disks organize data into fixed-sized blocks. When one wants to fetch a particular byte, the entire block in which that byte lives must be fetched. Likewise, when a single byte must be changed, the entire block is first read into memory, the byte is changed and then the entire block is written back to the disk. As application-level programmers, however, we prefer to think in terms of the file system abstraction: a file is a sequence of bytes of some arbitrary length. It is convenient to program with this abstraction in mind: we would like to be able to read a subsequence of the bytes from a file or write a new subsequence of bytes (either overwriting existing bytes in the file, or appending onto the file itself). During these processes, we prefer not to think in terms of which blocks on the hard disk that our bytes are coming from or going to. In addition, the file system abstraction also provides us with a convenient and logical way of finding files. Specifically, we use directories and subdirectories, along with specific names within a directory that map to specific files. For projects 3 and 4, you will implement a miniature file system, MYFS, that makes the connection between disk blocks and the file system abstraction. We will use a real file on our Linux systems as a virtual hard disk drive. This virtual disk will be accessed one block at a time. Access to the bytes on your virtual disk will be provided by the vdisk code that we provide. We also provide the file system data structure and a few other components. Your job in project 3 is to implement a hierarchical directory structure. In project 4, you will add files (with content!) to the file system. Specifically, as part of project 3, you will: Format the virtual disk with an initial file system. This initial file system will contain a root directory with . and .. already initialized. Provide an API of system calls (we won't actually be doing traps, but instead we will be calling functions in a library). These system calls include functionality such as creating/deleting directories and listing the contents of a directory. These will be implemented in myfs_lib.c/h Provide a set of helper functions that make your API easier to implement. These will be implemented in myfs_lib_support.c/h Objectives By the end of this project, you should be able to: Describe the logical data structures that are used by an OS to represent directories and files. Show how these logical data structures change under standard file system operations. Map these logical data structures onto a block-level data storage device. Manipulate the block-level data storage device under standard file system operations. Overview The diagram below shows the relationship of the different components that we are implementing. Starting from the bottom: vdisk (PROVIDED) implements a block-level storage device. This device allows read/write 10/24/2020 CS 3113: Project 3 https://cs.ou.edu/~fagg/classes/cs3113_f20/projects/project3.html 2/23 operations at the level of individual blocks of bytes. The storage of this block-level device is a Linux file (hence, this is a virtual disk) myfs_lib_support (TO BE IMPLEMENTED, mostly) provides reusable functionality for manipulating different parts of the file system at the block level myfs_lib (TO BE IMPLEMENTED) provides a set of virtual system calls that make up the user API application programs (PROVIDED) include: myfs_inspect: a program for examining the block-level structure of the disk. This program is about debugging and testing your code, and is not a user program myfs_stats: a program that prints out the sizes of the various structures on the disk (including blocks and index nodes) myfs_format: format the virtual disk myfs_list: list an existing file or the contents of an existing directory myfs_mkd: make a directory myfs_rmd: remove a directory Proper Academic Conduct The code solution to this project must be done individually. Do not copy solutions to this problem from the network and do not look at or copy solutions from others. However, you may use the net for inspiration and you may discuss your general approach with others. These sources must be documented in your README file. Logical Representation of the File System Index Nodes Index nodes contain the meta-data for a logical entity that is stored in the file system (either a file or a directory). The data inside the index node include: Index node type is one of: T_UNUSED, T_DIRECTORY, T_FILE, or T_PIPE references: the number of references to the index node by a directory. For index nodes that correspond to directories, references will always be 1. For index nodes that correspond to files, this value can be any positive number. However, when a file is first created, this count will be 1 (more on this in project 4) content: a BLOCK_REFERENCE to the block that contains the information associated with the index node size: for directories, this counts the number of items within the directory. For files, this counts the number of bytes in the file. All index nodes in the file system are referenced with an integer (of type INDEX NODE_REFERENCE). An INDEX NODE_REFERENCE can be the following values: UNALLOCATED_INDEX NODE: refers to an index node that does not exist (we will use it like a NULL pointer) 10/24/2020 CS 3113: Project 3 https://cs.ou.edu/~fagg/classes/cs3113_f20/projects/project3.html 3/23 Other non-negative integers: refer to an index node that is stored on the disk. Directory Entries A single directory entry contains the following information: name: a null-terminated string that is the name of the file or directory relative to the parent directory. Note that this space is fixed in size; any names that are too long to fit in this space are truncated so that they will fit (along with the null termination) index_node_reference: the number of the index node that corresponds to the name. If the directory entry is not used, then this is set to UNALLOCATED_INDEX NODE Blocks A block is a fixed-size unit of storage on the virtual disk. Each block on the disk is referenced using an integer; the type is BLOCK_REFERENCE. A BLOCK_REFERENCE can be one of the following values: UNALLOCATED_BLOCK: refers to a block that does not exist 0 ... n_blocks-1: identifies a unique block on the disk All blocks contain BLOCK_SIZE bytes, which are allocated accordingly: next_block: a BLOCK_REFERENCE to the block that is next in a linked list of blocks. If there is no next block, then this is set to UNALLOCATED_BLOCK content: the remaining bytes in the block (a total of DATA_BLOCK_SIZE bytes) that actually store the data in the block Depending on the context, the data stored in the block can be interpreted in one of several ways: data: bytes within a file volume: the virtual disk volume block index_nodes: an array of N_INDEX_NODES_PER_BLOCK index nodes directory: an array of N_DIRECTORY_ENTRIES_PER_BLOCK directory entries Volume Block The volume block contains several key components: n_blocks: the number of blocks that exist on the virtual disk n_allocated_blocks: the number of blocks that are currently being used to store information on the disk n_allocated_index_nodes: the number of index nodes that are currently being used to represent directories or files block_allocation_table: a bitmapped representation of which blocks are currently allocated There is a total of n_blocks blocks on the virtual disk (this number is no more than MAX_BLOCKS). The block allocation table consists of an array of unsigned chars (bytes), where each block is mapped to one bit in this array. The mapping is as follows: Block 0: byte 0, bit 0 Block 1: byte 0, bit 1 Block 2: byte 0, bit 2 Block 3: byte 0, bit 3 Block 4: byte 0, bit 4 10/24/2020 CS 3113: Project 3 https://cs.ou.edu/~fagg/classes/cs3113_f20/projects/project3.html 4/23 Block 5: byte 0, bit 5 Block 6: byte 0, bit 6 Block 7: byte 0, bit 7 Block 8: byte 1, bit 0 : In general, block i corresponds to byte i/8 and bit i%8 in the table. A bit value of 1 indicates that the block is being used; a bit value of 0 indicates that the block is free to be allocated. Unused bits are left at 0. Index Node Blocks Because index nodes are much smaller than blocks, we pack multiple index nodes into the block. This is represented within an index node block as an array of individual index nodes (a total of N_INDEX_NODES_PER_BLOCK). When the file system is first formatted, one block is allocated to representing index nodes. However, as needs grow, additional blocks are allocate. The list of blocks used for the index nodes is implemented as a linked list of blocks. Directory Blocks Because directory entries are much smaller than a block, we fit N_DIRECTORY_ENTRIES_PER_BLOCK directory entries within the block. As with index nodes, these are implemented as an array of directory entries. When the file system is first formatted, one block is permanently allocated to represent the content of the root directory. In addition, as new directories are created, they are allocated a single block to represent their content. However, as the directory grows in size, more directory blocks are allocated, forming a linked list of blocks. Logical Structure Examples Format Given the following command: ./myfs_format 128 The format command creates a disk that contains 128 blocks. After the format, the logical structure will be as follows: Block 0: Volume block (permanent) Block 1: Index node block for the root directory (permanent) Block 2: Directory block for the root directory (permanent) All other blocks: unused 10/24/2020 CS 3113: Project 3 https://cs.ou.edu/~fagg/classes/cs3113_f20/projects/project3.html 5/23 Notes: The next block for all blocks is initially UNALLOCATED_BLOCK. An index node is one element in an index node block. All unused index nodes must have their content property set to UNALLOCATED_BLOCK. 10/24/2020 CS 3113: Project 3 https://cs.ou.edu/~fagg/classes/cs3113_f20/projects/project3.html 6/23 Debugging is easier if all bytes in each block are set to zero before setting up the data structure. Create Directory Given the following command: ./myfs_mkd foo The logical structure will change to be: 10/24/2020 CS 3113: Project 3 https://cs.ou.edu/~fagg/classes/cs3113_f20/projects/project3.html 7/23 Notes: When allocating a new block, you must always select the smallest available block index. 10/24/2020 CS 3113: Project 3 https://cs.ou.edu/~fagg/classes/cs3113_f20/projects/project3.html 8/23 Creation of a directory in the above example involves the allocation of a single block for the content of the new directory. However, the more general case can include: If there are no available index nodes, then a new index node block must be allocated and appended to the index node block linked list If there are no available entries in the parent directory, then a new directory block must be allocated and appended to the parent directory block linked list When multiple blocks must be allocated during directory creation, they are allocated in the following order: index node block, parent directory block and finally the new directory block. Create Another Directory Given the next command: ./myfs_mkd foo/bar The logical structure will change to be: 10/24/2020 CS 3113: