computer engineering assignment- Verilog MIPS programming using vivado.
Microsoft Word - Project_CMPEN_331.docx Penn State University School of Electrical Engineering and Computer Science Page 1 of 8 Project CMPEN 331 – Computer Organization and Design Late submission is not accepted and will result in not getting any credits for the project In the project, you just need to implement what was described in the honor option section in lab 5, with the addition of the implementation and generation of the bit stream without errors. 1. Write a report that contains the following: i. Your Verilog design code. Use: i. Device: Zyboboard (XC7Z010- -1CLG400C) ii. Your Verilog® Test Bench design code. Add “`timescale 1ns/1ps” as the first line of your test bench file. iii. The waveforms resulting as requested from item 9 above. iv. The design schematics from the Xilinx synthesis of your design. Do not use any area constraints. v. Snapshot of the I/O Planning and vi. Snapshot of the floor planning vii. The design should be free from errors when synthesized, implemented and generated of the bitstream. The report format will be as follows: 2. REPORT FORMAT: Free form, but it must be: a. One report per student. b. Have a cover sheet with identification: Title, Class, Your Name, etc. c. You have to write an abstract at the beginning of the project report to describe what you are doing in the project. d. You should include an introduction for the project explaining with diagrams the connection between all the stages and what would be the benefit of using that architecture in the computer organization field. e. Use Microsoft word and your report should be uploaded in word format not PDF. If you know LaTex, you should upload the Tex file in addition to the PDF file. f. Single spaced The following part is not mandatory but any student will choose to do this part in addition to the previous part, will take 5 points extra to the total grade of the course: In this extra points project, the students are implementing a pipeline CPU using the Xilinx design package for FPGAs. You can use any information available in previous labs if needed. 3. Pipelining As described in lab 3 4. Circuits of the Instruction Fetch Stage As described in lab 3 5. Circuits of the Instruction Decode Stage As described in lab 3 6. Circuits of the Execution Stage Penn State University School of Electrical Engineering and Computer Science Page 2 of 8 As described in lab 4 7. Circuits of the Memory Access Stage As described in lab 4 8. Circuits of the Write Back Stage As described in lab 5 9. Control Hazards and Delayed Branch The control hazard occurs when a pipelined CPU executes a branch or jump instruction. The jump target address a jump instruction (jr, j, or jal) can be determined in the ID stage and it will be written into PC at the end of the ID stage. But because the pipelined CPU fetches instruction during every clock cycle, the next instruction is being fetched during the ID stage of the jump instruction. The control hazard caused by a conditional branch instruction (beq or bne) becomes more serious than that of a jump instruction because the condition must be evaluated in addition to the calculation of the branch of the target address. Figure 1 shows an example when we calculate the branch target address in the EXE or the ID stage respectively. There are mainly two methods to deal with the instruction(s) next to branch or jump instruction. One method is to cancel it (them). The other is to let it (them) be executed. The second method is called a delayed branch. The position in between the location of a jump or branch instruction and the jump or branch target address are called delay slots. MIPS (microprocessor without interlocked pipeline stages) ISA (instruction set architecture) adopts a one delay slot mechanism: the instruction located in delay slot is always executed no matter wither the branch is taken or not as shown in figure 2. In figure 2 (a) shows the case where the branch is not taken. Figure 2 (b) shows the case where the branch is taken; t is the branch target address. In both cases, the instruction located in a+4 (delay slot) is always executed no matter whether the branch is taken or not. In order to implement the delayed branch with one delay slot, we must let the conditional branch instructions finish the executions in the ID stage. There should be no problem for calculating the branch target address within the ID stage. For checking the condition, we can perform an exclusive OR (XOR) on the two source operands: rsrtequ = ~| (da^db); // (da == db) where the rsrtequ signal indicates where da or db are equal or not. Both da and db should be the state of the art data. Referring to figures 3 and 4, we use the outputs of the multiplexers for internal forwarding as da and db. This is the reason why we put the forwarding to the ID stage instead of to the EXE stage. Because the delayed branch, the return address of the MIPS jal instruction is PC+8. Figure 5 illustrates the execution of the jal instruction. The instruction located in delay slot (PC + 4) was already executed before transferring control to a function (or a subroutine). The return address should be PC+8, which is written into $31 register in the WB stage by the jal instruction. The return form subroutine can be done by the instruction of jr $31. The jr rs instruction reads the content of register rs and writes it into the PC. Figure 1 Determining whether a branch is taken or not taken (b) Branch is determined in ID stage beq ID EXE ? ? ? ? ? ? ? ? ? ID EXE beq ID ? ? ? ? ? ID EXE Target address: ID EXE MEM IFTarget address: IF IF (a) Branch is determined in EXE stage . Penn State University School of Electrical Engineering and Computer Science Page 3 of 8 Figure 2 Delayed branch Figure 3 Implementation with delayed branch with one delay slot (b) Branch is taken a: a+4: a+8: a+12: a: a+4: t: t+4: beq ID ID EXE beq ID ID EXE IF ID EXE MEM IF ID EXE MEM WB IF ID EXE MEM IF ID EXE MEM WB IFIF (a) Branch is not taken 4 clk IF ID op func Control unit fwdb fwda pcsrc rsrtequ imm addr equ rs rt EXE da db 0 1 3 2 0 1 3 2 0 1 3 2 4 a do Inst mem pc rna qa rnb qb d wn we Regfile <>< dpc4 epc8pc4 bpc jpc npc rsrtequ wpcir wpcir a do rna qa rnb qbd wn we 0 1 alu a b aluc 0 1 a do di we 0 1 clk 4 inst mem data mem regfile rs rt rd rt imm if id exe mem wb pc op func wreg m2reg wmem aluc aluimm regrt ewreg em2reg ewmem ealuc ealuimm mwreg mm2reg mwmem wwreg wm2regcu rs rt0 1 3 2 0 1 3 2 fwdb fwda rs rt ern em2reg ewreg mrn mm2reg mwreg ern mrn wrn e penn state university school of electrical engineering and computer science page 4 of 8 figure 4 mechanism of internal forwarding and pipeline stall figure 5 return address of the function call instruction for your reference, figure 6 illustrate the detailed circuit of the pipelined cpu, plus instruction memory and data memory. the pc can be considered as the first pipeline located in front of the if stage, and a register of the register file can be considered as the sixth (last) pipeline register at the end of the wb stage. in the if stage, an instruction is fetched from instruction memory, and the pc is incremented by 4 if the instruction in the id stage is neither a branch nor a jump instruction, and there is no pipeline stall. there are four sources for the next pc: pc4: pc+4 bpc: branch target address of a beq or bne instruction da: target address in register of a jr instruction jpc: jump target address of a j or jal instruction the selection of the next pc (npc) is done by a 32-bit 4-to-1 multiplexer whose selection signal is pcsrc (pc source), generated by the control unit in the id stage. in the id stage, two register operands are read from the register file based on rs and rt; the immediate (imm) is extended and the instruction is decoded based on op (and func) by the control unit. the selection signal of the multiplexer for alu’s input e.g. a is named fwda (forward a) and the other for alu’s input b, is named fwdb (forward b). if there is no data hazard, the multiplexer selects the data read from the register file. the inverse of the stall signal is used as the write enable for the pc and the if/id pipeline register (wpcir). the stall signal becomes true when an instruction in the id stage uses the result of an lw instruction which is in the exe stage. thus, the stall signal can be generated by the following verilog hdl code. stall = ewreg & em2reg & (ern!=0) & (i_rs & (ern== rs) | dpc4="" epc8pc4="" bpc="" jpc="" npc="" rsrtequ="" wpcir="" wpcir="" a="" do="" rna="" qa="" rnb="" qbd="" wn="" we="" 0="" 1="" alu="" a="" b="" aluc="" 0="" 1="" a="" do="" di="" we="" 0="" 1="" clk="" 4="" inst="" mem="" data="" mem="" regfile="" rs="" rt="" rd="" rt="" imm="" if="" id="" exe="" mem="" wb="" pc="" op="" func="" wreg="" m2reg="" wmem="" aluc="" aluimm="" regrt="" ewreg="" em2reg="" ewmem="" ealuc="" ealuimm="" mwreg="" mm2reg="" mwmem="" wwreg="" wm2regcu="" rs="" rt0="" 1="" 3="" 2="" 0="" 1="" 3="" 2="" fwdb="" fwda="" rs="" rt="" ern="" em2reg="" ewreg="" mrn="" mm2reg="" mwreg="" ern="" mrn="" wrn="" e="" penn="" state="" university="" ="" ="" ="" ="" school="" of="" electrical="" engineering="" and="" computer="" science="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" page="" 4="" of="" 8="" ="" ="" figure="" 4="" mechanism="" of="" internal="" forwarding="" and="" pipeline="" stall="" figure="" 5="" return="" address="" of="" the="" function="" call="" instruction="" for="" your="" reference,="" figure="" 6="" illustrate="" the="" detailed="" circuit="" of="" the="" pipelined="" cpu,="" plus="" instruction="" memory="" and="" data="" memory.="" the="" pc="" can="" be="" considered="" as="" the="" first="" pipeline="" located="" in="" front="" of="" the="" if="" stage,="" and="" a="" register="" of="" the="" register="" file="" can="" be="" considered="" as="" the="" sixth="" (last)="" pipeline="" register="" at="" the="" end="" of="" the="" wb="" stage.="" in="" the="" if="" stage,="" an="" instruction="" is="" fetched="" from="" instruction="" memory,="" and="" the="" pc="" is="" incremented="" by="" 4="" if="" the="" instruction="" in="" the="" id="" stage="" is="" neither="" a="" branch="" nor="" a="" jump="" instruction,="" and="" there="" is="" no="" pipeline="" stall.="" there="" are="" four="" sources="" for="" the="" next="" pc:="" pc4:="" pc+4="" bpc:="" branch="" target="" address="" of="" a="" beq="" or="" bne="" instruction="" da:="" target="" address="" in="" register="" of="" a="" jr="" instruction="" jpc:="" jump="" target="" address="" of="" a="" j="" or="" jal="" instruction="" the="" selection="" of="" the="" next="" pc="" (npc)="" is="" done="" by="" a="" 32-bit="" 4-to-1="" multiplexer="" whose="" selection="" signal="" is="" pcsrc="" (pc="" source),="" generated="" by="" the="" control="" unit="" in="" the="" id="" stage.="" in="" the="" id="" stage,="" two="" register="" operands="" are="" read="" from="" the="" register="" file="" based="" on="" rs="" and="" rt;="" the="" immediate="" (imm)="" is="" extended="" and="" the="" instruction="" is="" decoded="" based="" on="" op="" (and="" func)="" by="" the="" control="" unit.="" the="" selection="" signal="" of="" the="" multiplexer="" for="" alu’s="" input="" e.g.="" a="" is="" named="" fwda="" (forward="" a)="" and="" the="" other="" for="" alu’s="" input="" b,="" is="" named="" fwdb="" (forward="" b).="" if="" there="" is="" no="" data="" hazard,="" the="" multiplexer="" selects="" the="" data="" read="" from="" the="" register="" file.="" the="" inverse="" of="" the="" stall="" signal="" is="" used="" as="" the="" write="" enable="" for="" the="" pc="" and="" the="" if/id="" pipeline="" register="" (wpcir).="" the="" stall="" signal="" becomes="" true="" when="" an="" instruction="" in="" the="" id="" stage="" uses="" the="" result="" of="" an="" lw="" instruction="" which="" is="" in="" the="" exe="" stage.="" thus,="" the="" stall="" signal="" can="" be="" generated="" by="" the="" following="" verilog="" hdl="" code.="" stall="ewreg" &="" em2reg="" &="" (ern!="0)" &="" (i_rs="" &="" (ern="=" rs)=""> dpc4 epc8pc4 bpc jpc npc rsrtequ wpcir wpcir a do rna qa rnb qbd wn we 0 1 alu a b aluc 0 1 a do di we 0 1 clk 4 inst mem data mem regfile rs rt rd rt imm if id exe mem wb pc op func wreg m2reg wmem aluc aluimm regrt ewreg em2reg ewmem ealuc ealuimm mwreg mm2reg mwmem wwreg wm2regcu rs rt0 1 3 2 0 1 3 2 fwdb fwda rs rt ern em2reg ewreg mrn mm2reg mwreg ern mrn wrn e penn state university school of electrical engineering and computer science page 4 of 8 figure 4 mechanism of internal forwarding and pipeline stall figure 5 return address of the function call instruction for your reference, figure 6 illustrate the detailed circuit of the pipelined cpu, plus instruction memory and data memory. the pc can be considered as the first pipeline located in front of the if stage, and a register of the register file can be considered as the sixth (last) pipeline register at the end of the wb stage. in the if stage, an instruction is fetched from instruction memory, and the pc is incremented by 4 if the instruction in the id stage is neither a branch nor a jump instruction, and there is no pipeline stall. there are four sources for the next pc: pc4: pc+4 bpc: branch target address of a beq or bne instruction da: target address in register of a jr instruction jpc: jump target address of a j or jal instruction the selection of the next pc (npc) is done by a 32-bit 4-to-1 multiplexer whose selection signal is pcsrc (pc source), generated by the control unit in the id stage. in the id stage, two register operands are read from the register file based on rs and rt; the immediate (imm) is extended and the instruction is decoded based on op (and func) by the control unit. the selection signal of the multiplexer for alu’s input e.g. a is named fwda (forward a) and the other for alu’s input b, is named fwdb (forward b). if there is no data hazard, the multiplexer selects the data read from the register file. the inverse of the stall signal is used as the write enable for the pc and the if/id pipeline register (wpcir). the stall signal becomes true when an instruction in the id stage uses the result of an lw instruction which is in the exe stage. thus, the stall signal can be generated by the following verilog hdl code. stall = ewreg & em2reg & (ern!=0) & (i_rs & (ern== rs) |>