C++ assignment
CSI 5350: Generating Dataflow Equations This project is a continuation of Projects 1 and 2. You may complete this project in a group of three. • Add a control flow builder to the parser you built for Projects 1 and 2 by adding functions to your C++ classes for AST nodes. The control flow builder computes the control flow graph for each of the functions in the input program and then outputs the control flow graphs in a human readable manner. There are a number of different ways to represent a control flow graph. For instance, it can be represented a list of control flow edges with an edge contains pointers to source and target elementary blocks which are AST nodes. Another way to represent a control flow graph is to add into an elementary block pointers to its successor elementary blocks and pointers to its predecessors. You need test your control flow builder with 5 myC programs each of which contains two or more functions and uses all types of statements. • Augment the parser you built in the above step to to generate dataflow equations. – Simplify myC– by removing function invocations. Function declara- tions should be retained for future extension. – Test and debug your parser to make sure that it generates correct abstract syntax trees; – Choose one of classic dataflow analyses: reaching definitions, avail- able expressions, very busy expressions and live variables; – Add to the parser a function named analyzer that walks the abstract syntax tree and generates a system of dataflow equations for each function in a given myC– program; – You will need to add functions that computes kill and gen sets for elementary blocks. The kill and gen sets for an elementary block may be stored in itself. – Test your analyzer with a suit of 5 test programs in myC–. Submit your project along with the test programs and test results. 1 repanaly.dvi Representation and Analysis of Software Mary Jean Harrold Gregg Rothermel Alex Orso Georgia Tech Oregon State University Georgia Tech 1 Introduction This paper presents some basic techniques for representation and analysis of software. We use the term program to refer to both procedures and monolithic programs. 2 Control Flow Graphs A control flow graph1 (CFG) is a directed graph in which each node represents a basic block and each edge represents the flow of control between basic blocks. To build a CFG we first build basic blocks, and then we add edges that represent control flow between these basic blocks. A basic block is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end. We can construct the basic blocks for a program using algorithm GetBasicBlocks, shown in Figure 1. algorithm GetBasicBlocks Input. A sequence of program statements. Output. A list of basic blocks with each statement in exactly one basic block. Method. 1. Determine the set of leaders: the first statements of basic blocks. We use the following rules. (a) The first statement in the program is a leader. (b) Any statement that is the target of an conditional or unconditional goto statement is a leader. (c) Any statement that immediately follows a conditional or unconditional goto statement is a leader. Note: control transfer statements such as while, if-else, repeat-until, and switch statements are all “conditional goto statements”. 2. Construct the basic blocks using the leaders. For each leader, its basic block consists of the leader and all statements up to but not including the next leader or the end of the program. Figure 1: Algorithm to construct basic blocks for a program. When we analyze a program’s intermediate code for the purpose of performing compiler optimizations, a basic block usually consists of a maximal sequence of intermediate code statements. When we analyze source code, a basic block consists of a maximal sequence of source code statements. We often find it more convenient in the latter case, however, to just treat each source code statement as a basic block. 1See Reference [2] for additional discussion of control flow graphs. 1 algorithm GetCFG Input. A list of basic blocks for a program where the first block (B1) contains the first program statement. Output. A list of CFG nodes and edges. Method. 1. Create entry and exit nodes; create edge (entry, B1); create edges (Bk, exit) for each basic block Bk that contains an exit from the program. 2. Traverse the list of basic blocks and add a CFG edge from each node Bi to each node Bj if and only if Bj can immediately follow Bi in some execution sequence, that is, if: (a) there is a conditional or unconditional goto statement from the last statement of Bi to the first statement of Bj , or (b) Bj immediately follows Bi in the order of the program, and Bi does not end in an unconditional goto statement. 3. Label edges that represent conditional transfers of control as “T” (true) or “F” (false); other edges are unlabeled. Figure 2: Algorithm to construct the CFG for a program. After we have constructed basic blocks, we can construct the CFG for a program using algorithm GetCFG, shown in Figure 2. The algorithm also works for the case where each source statement is treated as a basic block. To illustrate, consider Figure 3, which gives the code for program Sums on the left and the CFG for Sums on the right. Node numbers in the CFG correspond to statement numbers in Sums: in the graph, we treat each statement as a basic block. Each node that represents a transfer of control (i.e., 4 and 7) has two labeled edges emanating from it; all other edges are unlabeled. In a CFG, if there is an edge from node Bi to node Bj , we say that Bj is a successor of Bi and that Bi is a predecessor of Bj . In the example, node 4 has successor nodes 5 and 12, and node 4 has predecessor nodes 3 and 11. Figure 4 gives another example program Trivial and its CFG. Notice that Trivial contains an if-then-else statement and a switch statement. In the CFG, we have inserted nodes J1 and J2. These nodes, which we call “join” nodes, would not be created by algorithm GetCFG; however, we often find it convenient to insert them, to represent places where flow of control merges following a conditional statement. Notice that we represent the switch predicate as a node with multiple out-edges – one for each case value. Each edge is labeled with the value to which the switch predicate must evaluate in order for that edge to be taken. In CFGs built from source code, where we use statements as basic blocks, we do not create blocks for non-executable structure-delimiting statements such as “begin”, “end”, “else”, “endif”, “endwhile”, “case”, “}”, or “{”. 2 T F T endwhile; F exit entry 1 2 3 4 5 6 7 8 910 11 12 end Sums Program Sums 1. read(n); 2. i = 1; 3. sum = 0; 4. while (i<= n)="" do="" 5.="" sum="0;" 6.="" j="1;" 7.="" while="" (j="">=><= i)="" do="" 8.="" sum="sum" +="" j;="" 9.="" j="j" +="" 1;="" 10.="" write(sum,="" i);="" 11.="" i="i" +="" 1;="" 12.="" write(sum,="" i);="" endwhile;="" figure="" 3:="" program="" segment="" on="" the="" left,="" with="" its="" cfg="" on="" the="" right.="" entry="" exit="" 1="" 2="" 3="" 4="" end="" trivial="" 5="" 6="" 7="" 8="" 9="" j1="" j2="" t="" f="" 1="" default="" 2="" 3="" program="" trivial="" 2.="" if="">=><0) then 1. read(n); 3. write("negative"); else 4. write("positive"); endif; 5. switch (n) case 1: 6. write("one"); 7. write("two"); break; case 2: case 3: 8. write("three"); break; 9. write("other"); default: endswitch; figure 4: program segment on the left, with its cfg on the right. 3 3 data flow information we can classify each reference to a variable in a program as a definition or a use. a definition of a variable occurs whenever a variable gets a value. for example, in figure 3, there is a definition of variable n in statement 1 due to the input statement, and there is a definition of variable sum in statement 5 because of the assignment statement. a use of a variable occurs whenever the value of the variable is fetched. uses are further classified as either computation uses (c-uses) or predicate uses (p-use) [11]. a computation use occurs whenever a variable either directly affects the computation being performed or is output. a predicate-use directly affects the control flow through the program but only indirectly affects a computation. in figure 3, there are c-uses of sum and j in statement 8 because their values directly affect the value of sum computed in that statement. likewise, there are c-uses of sum and i in statement 10 because their values directly affect the program output at that statement. on the other hand, the uses of j and i in statement 7 directly affect the flow of control from that statement and thus, are p-uses. by inspecting each statement (basic block) in a program, we can identify the definitions and uses in that statement (basic block). this identification is called local data flow analysis. although local data flow information can be useful for activities such as local optimization, many important compiler and software engineering tasks require information about the flow of data across statements (basic blocks). data flow analysis that computes information then="" 1.="" read(n);="" 3.="" write("negative");="" else="" 4.="" write("positive");="" endif;="" 5.="" switch="" (n)="" case="" 1:="" 6.="" write("one");="" 7.="" write("two");="" break;="" case="" 2:="" case="" 3:="" 8.="" write("three");="" break;="" 9.="" write("other");="" default:="" endswitch;="" figure="" 4:="" program="" segment="" on="" the="" left,="" with="" its="" cfg="" on="" the="" right.="" 3="" 3="" data="" flow="" information="" we="" can="" classify="" each="" reference="" to="" a="" variable="" in="" a="" program="" as="" a="" definition="" or="" a="" use.="" a="" definition="" of="" a="" variable="" occurs="" whenever="" a="" variable="" gets="" a="" value.="" for="" example,="" in="" figure="" 3,="" there="" is="" a="" definition="" of="" variable="" n="" in="" statement="" 1="" due="" to="" the="" input="" statement,="" and="" there="" is="" a="" definition="" of="" variable="" sum="" in="" statement="" 5="" because="" of="" the="" assignment="" statement.="" a="" use="" of="" a="" variable="" occurs="" whenever="" the="" value="" of="" the="" variable="" is="" fetched.="" uses="" are="" further="" classified="" as="" either="" computation="" uses="" (c-uses)="" or="" predicate="" uses="" (p-use)="" [11].="" a="" computation="" use="" occurs="" whenever="" a="" variable="" either="" directly="" affects="" the="" computation="" being="" performed="" or="" is="" output.="" a="" predicate-use="" directly="" affects="" the="" control="" flow="" through="" the="" program="" but="" only="" indirectly="" affects="" a="" computation.="" in="" figure="" 3,="" there="" are="" c-uses="" of="" sum="" and="" j="" in="" statement="" 8="" because="" their="" values="" directly="" affect="" the="" value="" of="" sum="" computed="" in="" that="" statement.="" likewise,="" there="" are="" c-uses="" of="" sum="" and="" i="" in="" statement="" 10="" because="" their="" values="" directly="" affect="" the="" program="" output="" at="" that="" statement.="" on="" the="" other="" hand,="" the="" uses="" of="" j="" and="" i="" in="" statement="" 7="" directly="" affect="" the="" flow="" of="" control="" from="" that="" statement="" and="" thus,="" are="" p-uses.="" by="" inspecting="" each="" statement="" (basic="" block)="" in="" a="" program,="" we="" can="" identify="" the="" definitions="" and="" uses="" in="" that="" statement="" (basic="" block).="" this="" identification="" is="" called="" local="" data="" flow="" analysis.="" although="" local="" data="" flow="" information="" can="" be="" useful="" for="" activities="" such="" as="" local="" optimization,="" many="" important="" compiler="" and="" software="" engineering="" tasks="" require="" information="" about="" the="" flow="" of="" data="" across="" statements="" (basic="" blocks).="" data="" flow="" analysis="" that="" computes="">0) then 1. read(n); 3. write("negative"); else 4. write("positive"); endif; 5. switch (n) case 1: 6. write("one"); 7. write("two"); break; case 2: case 3: 8. write("three"); break; 9. write("other"); default: endswitch; figure 4: program segment on the left, with its cfg on the right. 3 3 data flow information we can classify each reference to a variable in a program as a definition or a use. a definition of a variable occurs whenever a variable gets a value. for example, in figure 3, there is a definition of variable n in statement 1 due to the input statement, and there is a definition of variable sum in statement 5 because of the assignment statement. a use of a variable occurs whenever the value of the variable is fetched. uses are further classified as either computation uses (c-uses) or predicate uses (p-use) [11]. a computation use occurs whenever a variable either directly affects the computation being performed or is output. a predicate-use directly affects the control flow through the program but only indirectly affects a computation. in figure 3, there are c-uses of sum and j in statement 8 because their values directly affect the value of sum computed in that statement. likewise, there are c-uses of sum and i in statement 10 because their values directly affect the program output at that statement. on the other hand, the uses of j and i in statement 7 directly affect the flow of control from that statement and thus, are p-uses. by inspecting each statement (basic block) in a program, we can identify the definitions and uses in that statement (basic block). this identification is called local data flow analysis. although local data flow information can be useful for activities such as local optimization, many important compiler and software engineering tasks require information about the flow of data across statements (basic blocks). data flow analysis that computes information>