ISA Simulation: creating your own emulator in C++
Computer Architecture Fall 2020 Assignment 2: ISA emulation Deadline: Friday, November 6, 2020 In this second assignment you will develop an emulator that is capable of executing simple RISC- V programs. The RISC-V architecture has been chosen because of its simplicity. The main learning objective of this assignment is to acquire a deep understanding of how processors execute instructions, the challenges of pipelining, and how the different categories of instructions are implemented. We will limit ourselves to the execution of instructions. We will not concern ourselves with the detailed simulation of the characteristics of DDR memory nor will we model and simulate a cache subsystem. Within this assignment we will assume that a memory load/store operation can be performed in a single clock cycle. Of course, these things can be simulated in detail if the emulator were to be further extended. You do not have to start from scratch. From BrightSpace a skeleton, or rather a starting point, can be downloaded. A number of things have already been implemented within this starting point: loading programs in ELF format, a simplified emulation of a memory bus and memory, register file and simple “devices” to output characters and to the halt the system. During the development of the emulator it is very important to test your code. You will have to write micro programs (unit tests) to test individual instructions. To help you get started, the starting point contains a test harness that will be elaborated on below. To test the emulator in its entirety, a collection of more extensive test programs can be obtained from BrightSpace. These programs increase in complexity: basic.bin, hello.bin, comp.bin and brainfuck.bin (an interpreter of the well-known programming language). These test programs will also be used to determine the correctness score (the achieved level) of your program when grading the assignment. Background Information For general theory on Instruction Set Architectures we refer to Appendix A of the textbook (Hen- nessy and Patterson). In Appendix C the “classic RISC” pipeline is described, which consists of five steps to execute an instruction: Instruction Fetch, Instruction Decode, Execute, Mem- ory, Write Back. In this assignment, we will process instructions according to these five steps. Appendix C.3 describes the implementation of a pipeline for a simple RISC-V processor. More detailed treatment can also be found in Chapter 4 of Computer Organization and Design also by Hennessy and Patterson. Detailed information on the RISC-V instruction set can be found in the architecture reference manual: https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf. In this assignment we consider the RV64I Instruction set. Particularly useful is the concise in- struction list at pages 104-107 (Chapter 19) of the above mentioned document. The Big Picture Essentially what this assignment is about is to implement the data path and necessary control for a simple processor in a C++ code base. Many of the classes that will be implemented are similar to hardware components. Using getter and setter methods these components will be connected with each other. Consider the data path in Figure 1 as discussed by Hennessy and Patterson. You will find several of these components in the C++ skeleton that is provided to you, such as the register file, pipeline registers and instruction and data memories. Not all of the components have been fully implemented. Because we will start with a non-pipelined design that is later extended to a pipelined design, we will use the pipeline registers in all cases. The data path consists of 5 stages. Each clock cycle, one of the stages is executed. This execution is done in two steps: propagate and 1 https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf Figure 1: Possible data path for a simple RISC-V processor. Source: Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 6th edition. Figure C.19. clock pulse. During the propagate step the (source) pipeline register may be read and inputs of the different components are set. In fact, the inputs from the pipeline register propagate through the various components. In the clock pulse step the destination pipeline register is written. Also in components that contain state (such as the register file) a write is carried out. So, the overall rule is: only during propagate pipeline registers can be read; any components that contain state (pipeline registers, register file, data memory, PC, etc.) can only be written during clock pulse. In the source files stages.cc and stages.h you can find the different stages and pipeline registers stubbed out. They are still mostly empty! In this file you will be writing the necessary code to interconnect the components. For example, for the execute stage you will need to read fields from the id ex pipeline register (struct ID EXRegister) and configure the inputs of the ALU during the propagate step. In the clock pulse step of EX you need to set fields in the ex m pipeline register (struct EX MRegister) with the output of the ALU and any other data that needs to be passed along for the future stages. To understand how we model components, let us consider the register file as an example. An implementation of this component is provided to you (contrary to many other components). Figure 2 shows the C++ methods on the left and a schematic of the component on the right. The RS1 and RS2 inputs can be set right after instruction decoding in the propagate phase of the ID stage. In the clock pulse phase the outputs of the register file can be read and stored in a pipeline void setRS1(const RegNumber); void setRS2(const RegNumber); void setRD(const RegNumber); void setWriteData(const RegValue); void setWriteEnable(bool); RegValue getReadData1(); RegValue getReadData2(); void clockPulse(); Register File RS1 RS2 RD readData1 readData2 writeData writeEnable (control) Figure 2: Left: C++ methods of the register file component. Right: schematic of the register file component. 2 register. The RD and writeEnable signals are to be set from the propagate phase of the writeback stage. During the clock pulse stage of the writeback stage the clockPulse method is called to trigger the actual write to the register file. Compare this with the lines drawn in the data path of Figure 1! As can be seen in the data path there are several multiplexers (mux) and other components that need to be controlled. The multiplexers need to be told which of the inputs to relay to the output. The ALU needs an input that indicates the operation to be carried out. These are not data inputs (contrary to instruction word, register numbers, ALU output), but rather control inputs. So, next to the data path we need control signals, sometimes referred to as the control of the processor. Note that in Figure 1 the control lines are not shown. How to determine the correct values of the different control signals? Well, this is all encoded into the instruction. When decoding the instruction, we need to determine the control signals such that exactly this instruction is being executed. In case of RISC-V, we can determine the control signals from the combination of opcode and function code. For the instruction decoding logic this means that we want to have an initial stage which splits the instruction into the different fields: RS1, RS2, RD, opcode, function code and the immediate. We suggest to implement this in the class InstructionDecoder. Some of these fields are passed directly to components (such as passing RS1 to the register file). And some of these fields also need to be passed to the pipeline register for use in a later step (for instance RD). To implement the control signals, we suggest to implement a ControlSignals class. This will form a separate component. The instruction decode stage should set the opcode and function code as inputs on this component. After that, the control signals can be obtained using various getters. For example, you would want getters for determining the ALU operation (which is to be set as input on the ALU in EX), setting the selectors on the multiplexers in front of the A and B inputs of the ALU, and so on. You also want to pass the instance of the ControlSignals class to later stages through the pipeline registers. Figure 3 depicts the interplay between the initial instruction decoder and the control signals component. Control AInput BInput ALUOp opcode function isBranchInstruction Decoder RS1 RS2 RD instruction word immediate opcode function regWriteInput ... Figure 3: Interconnection between instruction decoder and control signals component. Development Environment You should be able to work on this assignment using a Linux, Windows, or macOS machine. You can also use the University Linux environment remotely through ssh. In case you work on this assignment on your own computer, please do make sure everything compiles and works as expected on the University Linux environment before submitting. 3 Linux A modern C++-compiler is required (at least g++ 5.x or Clang 3.5) as well as Python 3.5 or higher. Any recent Linux distribution should suffice. You can compile using the provided Makefile. You can also work remotely on the University Linux workstations (Ubuntu 16 or 18). The compiler that is available here is recent enough. We do recommend to use a slightly more recent compiler, gcc 8.2.0 is also available and can be enabled with: source /vol/share/software/gcc/8.2.0-profile Windows A Visual Studio project file is provided to compile a native Windows executable of the project. See the Windows subdirectory in the skeleton code. We do however strongly recommend to use Windows Subsystem for Linux (WSL). This will create a Linux environment on top of your Win- dows installation. Within this Linux environment you can use gcc and the Makefile as you would on Linux. More information on installing WSL can be found here: https://docs.microsoft.com/en-us/windows/wsl/install-win10 Skeleton Code After successfully compiling the skeleton code, try running the emulator with a test program: ./rv64-emu ../lab2-test-programs/hello.bin You will probably notice that the emulator gets stuck in an infinite loop. This is actually the expected result! The given skeleton code is not functional and many essential parts are missing. It is up to you to implement these. We will first briefly describe how the code base is organized. Structure The core of the code base can be found in the files stages.h and stages.cc. Within these files the five stages of instruction execution are to be implemented. The classes are already stubbed out. In the file processor.cc these stages are constructed and the method Processor::run() will run the stages in sequence. This run method is in fact the main loop of the program. At the bottom of the definition of class Processor in processor.h you can find member variables that hold the stages, a number of shared components (such as the register file) and the pipeline registers. When a stage needs access to a shared component a reference is passed to the stage upon construction as can be seen in the constructor of the Processor class. Where To Start We recommend to