Look at files
CSE340F22 PROJECT 2 IMPLEMENTATION GUIDANCE Rida Bazzi This is not meant to be a detailed implementation guide, but I address major implementation issues that you are likely to encounter. By now you should know that reading everything carefully is essential and can save you a lot of time. You should have a plan and understand how the various pieces will fit together before you start coding. You can do Task 1 without worrying how task 2 or 3 will be implemented. You can do Task 2 without worrying how Task 3 will be implemented because Task 3 will build on the abstract syntax tree generated by Task 1 and can you that as a starting point. Do not delay asking for help. If your implementation seems to be getting too complicated, especially conceptually, you should step back and simplify. Do not end up like this guy: Parsing Concerns Operator precedence parsing is at the heart of all the tasks for this project, so you really cannot afford not doing it right. It is very important that you have a complete understanding of how operator precedence parsing works before writing a single line of code. In project 1 some students tried recursive descent parsing by trial-and-error instead of systematically following the rules for predictive parsing that I covered in class. That did not always go well and in the best case, is too time consuming. Parsing by trial-and-error will not work for this project and you really need to know what you are doing. As a start, you should have studied the operator precedence parsing notes before reading this document. For this project, operator precedence parsing will be invoked for parsing expressions. There are four places where that occurs: ID [.] = expr; ID = expr; ID [expr] = expr; OUTPUT ID[expr]; So, the function to parsing assignment statements might look something like this (this code is not meant to be copied and pasted. It is only for illustration purposes): parse_assign_stmt() { expect(ID) t1 = lexer.peek(1); t2 = lexer.peek(2); if ( t1t2 == [. ) // array access with . { expect(LBRAC); expect(DOT); expect(RBRAC); } else if (t1 == [ ) // array access but not . { expect(LBRAC); parse_expr(); expect(RBRAC); } // at this point we have parsed the [.] or [expr] // or we are dealing with the case ID = // in all cases, the next token must be EQUAL expect(EQUAL); parse_expr(); expect(SEMICOLON); } As you can see, the recursive-descent part of the parsing is really not that involved. OUTPUT statements can also be handled in a similar manner. It is the operator precedence parsing where the bulk of your work is done. Next we are going to examine the call to parse_expr()more carefully. parse_expr(): what is the end-of-input symbol? As we have seen in the notes on operator precedence parsing, we start with a stack containing $ and the input is followed by the same $ symbol. In our project, parse_expr()is called in three settings. In one setting, the part of the input to be parsed as an expression is followed by SEMICOLON. In two other settings, the input is followed by RBRAC. They are highlighted below: ID[.] = expr ; (1) ID = expr ; (2) ID[ expr ] = expr ; (3) (4) OUTPUT ID[ expr ] ; (5) To handle these different situations, the parse_expr() needs to identify when the end of input is reached for the call. The cases outlined above can be problematic in the case of RBRAC in variable- access or OUTPUT statement because it can also appear as part of expr itself. So, instead of RBRAC, we can use RBRAC EQUAL to indicate end of input in the case of variable access and RBRAC SEMICOLON in the case of OUTPUT statement To handle these cases, I suggest the following: 1. Instead of using GetToken()and peek() directly inside parse_expr(), wrap the call inside functions get_symbol()and peek_symbol() respectively. The functions peek_symbol() returns $ (or EOE = End Of Expression as one student suggested to me after class) in the following cases: 1. If the next token is SEMICOLON ; cases (1) , (2) and (4) 2. If the next token is RBRAC and the token after that is EQUAL ; case (3) 3. If the next token is RBRAC and the token after that is SEMICOLON ; case (5) 2. The functions return the next token in all other cases. Having a wrapper function will make your code easier to work with parse_expr(): parsing table and stack contents As we have seen in class, the parsing table is convenient to check the parsing precedence relationship for two terminals (this includes $). 1. We need the table to determine the relationship between the terminal closest to the top of the stack and the next input symbol. 2. We also need the table to determine the relationship between the terminal on the top of the stack and the last popped terminal when we are popping the stack to identify the RHS of a rule. We start by discussing the format of the stack both terminals and non-terminals (expr) are stored on the stack, so the stack entries should make the distinction between these two cases. For that, you can can have a structure struct stackNode { snodeType type; // enum type can be EXPR or TERM union { struct exprNode *expr; // this is used when the type is EXPR Token term; // this is used when the type is TERM } } The union declaration is used like a structure when only one field is needed for a particular instance. It allows to use the same memory locations for the two fields, which results in less memory usage. I also like to use union instead of struct because by declaring the fields as fields of a union, you are documenting the fact that for a given stackNode, you are only either using the expr field or the term field. The type field tells you which case it is. Now, the stack will contain stackNode structs that can accommodate both kinds of content. The stack itself is a vector of stack nodes vector
stack; or, even better, you can use a proper stack data structure from the C++ library. The treeNode struct can look something like this Struct exprNode { int operator; // enum type: ID_OPER, PLUS_OPER, MINUS__OPER, DIV_OPER // ARRAY_ELEM_OPER, WHOLE_ARRAY_OPER int type; // type of expression SCALAR, ARRAY, ARRAYDDECL, or ERROR // these types are discussed later under type checking union { struct { // operator = ID String varName; int line_no; } id; struct { // operator = PLUS_OPER, MINUS_OPER // MULT_OPET or ARRAY_ELEM_OPER struct treeNode *left; struct treeNode *right; } child; struct { // operator = WHOLE_ARRAY_OPER String arrayName; int line_no; } array; } } Warning do NOT cut and paste from the document. It is not complete Stack Content Example Example If we consider the execution from Notes_4, we have the following stack content on page 62 (assuming all variables are declared as SCALAR variable). Notice how the parentheses are not included in the tree because they are really not needed. The parentheses are in the input to tell us how to group c+d. Once we parsed it and we stored in the abstract syntax tree (AST) correctly grouped, there is not need for parentheses. type = TERM term = {EOF, “”, 0} type = EXPR expr = type = EXPR term = {PLUS, “”, 1} type = EXPR expr = operator = ID type = SCALAR id.varName = a id.ine_no = 1 operator = MULT type = SCALAR child.left = child.right= operator = ID type = SCALAR id.varName = b id.ine_no = 1 operator = ID type = SCALAR id.varName = c id.ine_no = 1 operator = ID type = SCALAR id.varName = d id.ine_no = 1 operator = PLUS type = SCALAR child.left = child.right= a + b * (c + d) $ a + * b + c d parse_expr(): stack operations So far, we discussed the structures that will be needed to represent the stack contents which can accommodate both kinds of content (terminals and expressions). Now, we look at the stack operations that we need. 1. Peek at the terminal closest to the top this is either the top of the stack or just below the top of the stack 2. Shift a token on the stack 3. Reduce This one is more involved. There are two parts to this: 1. You should write a function that pops elements from the stack until the top of stack has a terminal that is <. the="" last="" popped="" terminal.="" 2.="" check="" if="" the="" popped="" elements="" match="" the="" rhs="" of="" one="" of="" the="" rules.="" 3.="" build="" the="" abstract="" syntax="" tree="" after="" the="" reduction="" i="" discuss="" each="" of="" these="" points="" on="" the="" next="" page="" parse_expr():="" peek="" at="" the="" terminal="" closest="" to="" the="" top="" the="" implementation="" of="" this="" is="" straightforward.="" if="" the="" top="" of="" the="" stack="" is="" terminal,="" that="" is="" what="" the="" function="" returns="" otherwise,="" if="" the="" top="" of="" the="" stack="" is="" not="" a="" terminal,="" you="" can="" look="" at="" the="" entry="" just="" below="" the="" top="" of="" stack="" otherwise="" parse_expr():="" shift="" a="" token="" on="" the="" stack="" this="" is="" also="" straightforward.="" when="" you="" determine="" that="" the="" token="" needs="" to="" be="" shifted,="" you="" need="" to="" consume="" the="" token="" from="" the="" input="" using="" get_symbol()and="" create="" a="" stacknode="" that="" contains="" the="" token="" information.="" the="" type="" of="" the="" node="" should="" be="" term.="" after="" you="" create="" the="" stacknode,="" you="" should="" push="" it="" on="" the="" stack.="" rida="" bazzi="" rida="" bazzi="" rida="" bazzi="" parse_expr():="" reduce:="" popping="" elements="" and="" matching="" righthand="" side="" of="" rules="" popping="" the="" elements="" for="" this="" functionality,="" you="" need="" to="" implement="" a="" loop="" that="" pops="" the="" elements="" of="" the="" stack="" until="" the="" top="" of="" the="" stack="" is="" a="" terminal="" top="" such="" that="" table[top][last="" popped="" terminal]=""><. the popped elements are supposed to represent the righthand side of one of the rules for expr. so, we need 1. a way to represent the righthand sides of rules and 2. a way to compare the sequence of popped elements with the righthand sides and determine if the sequence of popped elements is a valid righthand side. for each rule, you can store the righthand side as a vector of strings or the whole righthand side as a string (and maybe store it in a map). here are some example representations the="" popped="" elements="" are="" supposed="" to="" represent="" the="" righthand="" side="" of="" one="" of="" the="" rules="" for="" expr.="" so,="" we="" need="" 1.="" a="" way="" to="" represent="" the="" righthand="" sides="" of="" rules="" and="" 2.="" a="" way="" to="" compare="" the="" sequence="" of="" popped="" elements="" with="" the="" righthand="" sides="" and="" determine="" if="" the="" sequence="" of="" popped="" elements="" is="" a="" valid="" righthand="" side.="" for="" each="" rule,="" you="" can="" store="" the="" righthand="" side="" as="" a="" vector="" of="" strings="" or="" the="" whole="" righthand="" side="" as="" a="" string="" (and="" maybe="" store="" it="" in="" a="" map).="" here="" are="" some="" example="">