We revisit the code of with branches and jumps.
(a) Identify all basic blocks in the code.
(b) Schedule the code the best you can by using local scheduling only (i.e., within basic blocks) on the VLIW machine of Note that the branch is delayed by one cycle. This delay has not been scheduled in the code.
(c) A simple compiler designed by a student in a computer science class project reorganizes the code globally as follows, the student thinking that it will have better performance for the same result (in this code again branch delays have not been taken into account):
The student argues that this code is correct because the result in memory will always be the same as for the original code. However, the student has ignored a few things: LW instructions were moved up across stores, which creates memory hazards; instructions causing exceptions were moved up across jumps and branches, which creates control hazards and possible unwanted exceptions.
At least the student decided not to move stores up across branches. (Why is this a good thing?) In any case, a mechanism is needed to detect and correct the problems with memory hazards and exceptions. For this purpose, we use mechanisms and instructions similar to those in the IA-64 ISA. First we deal with exceptions. Assume first that no instructions raise exceptions except for loads and stores. (Remember that stores can never be speculative.) When a load is elevated across a branch (with its dependent instructions) it becomes a speculative load with opcode LW.s (e.g., LW.s R1,0(R2)). Because the load is now speculative and its execution may not be required, no exception that is visible to the program may be signaled. (For example, a page fault is not visible to the program, but an address misalignment is.) When such an exception occurs on a speculative load, the value returned by the load is often undefined. Thus the destination register is poisoned and its content is replaced by an exception descriptor. At the location where the LW was in the original code, a check instruction is inserted, with format check.s R1,repair. If the speculative load and its dependent instructions were not supposed to be executed, then the check.s instruction is not executed. On the other hand, if the speculative load instruction was supposed to be executed, the check.s instruction is executed and looks up the register. If the register is valid, all is well and execution proceeds. If the register is poisoned, then the execution jumps to repair code, which essentially re-executes the sequence of elevated instructions, but without the speculative load. At this time the exception is taken.
Second, we deal with memory hazards, using a similar mechanism. When a load is elevated with its dependent instructions across one or multiple stores and the compiler cannot disambiguate addresses, the load value becomes speculative and the load becomes LW.a (e.g., LW.a R1,0(R2)). At the location where the load was in the original code, a check instruction is inserted with format check.a 0(R2), repair. This instruction works in conjunction with a small hardware table called the advanced load address table (ALAT): the LW.a inserts its address in the ALAT; if a store with the same address is executed between the LW.a and the check.a, the address is removed from the ALAT and the check.a can detect this. If the value returned by the LW.a was stale, the check.a instruction jumps to some fix-up code, which mostly repeats the load and its dependent instructions. Do the following:
add the instructions in the new code to check for misspeculations (both memory and exceptions); schedule the new code locally, taking advantage of the branch and jump delay slots; schedule the code on the VLIW machine of What is the speedup obtained by this new code on the VLIW machine, for all possible cases? When is the new code with speculative loads better?