CS 280 Spring 2023 Programming Assignment 1 Building a Lexical Analyzer for the SPL Language February 16, 2023Due Date: Sunday, March 5, 2023, 23:59 Total Points: 20In this programming...




e building a lexical analyzer for small programming language, called Simple Perl-Like (SPL), and a program to test it.


CS 280 Spring 2023 Programming Assignment 1 Building a Lexical Analyzer for the SPL Language February 16, 2023 Due Date: Sunday, March 5, 2023, 23:59 Total Points: 20 In this programming assignment, you will be building a lexical analyzer for small programming language, called Simple Perl-Like (SPL), and a program to test it. This assignment will be followed by two other assignments to build a parser and an interpreter to the SPL language. Although, we are not concerned about the syntax definitions of the language in this assignment, we intend to introduce it ahead of Programming Assignment 2 in order to determine the language terminals: reserved words, constants, identifier(s), and operators. The syntax definitions of the SPL language are given below using EBNF notations. However, the details of the meanings (i.e. semantics) of the language constructs will be given later on. 1. Prog ::= StmtList 2. StmtList ::= Stmt ;{ Stmt; } 3. Stmt ::= AssignStme | WriteLnStmt | IfStmt 4. WriteLnStmt ::= WRITELN (ExprList) 5. IfStmt ::= IF (Expr) ‘{‘ StmtList ‘}’ [ ELSE ‘{‘ StmtList ‘}’ ] 6. AssignStmt ::= Var = Expr 7. Var ::= NIDENT | SIDENT 8. ExprList ::= Expr { , Expr } 9. Expr ::= RelExpr [(-eq|==) RelExpr ] 10. RelExpr ::= AddExpr [ ( -lt | -gt | < |=""> ) AddExpr ] 11. AddExpr :: MultExpr { ( + | - | .) MultExpr } 12. MultExpr ::= ExponExpr { ( * | / | **) ExponExpr } 13. ExponExpr ::= UnaryExpr { ^ UnaryExpr } 14. UnaryExpr ::= [( - | + )] PrimaryExpr 15. PrimaryExpr ::= IDENT | SIDENT | NIDENT | ICONST | RCONST | SCONST | (Expr) Based on the language definitions, the lexical rules of the language and the assigned tokens to the terminals are as follows: 1. The language has general identifiers, referred to by IDENT terminal, which are defined as a word that starts by a letter or an underscore ‘_’, and followed by zero or more letters, digits, or underscores ‘_’ characters. Note that all identifiers are case sensitive. It is defined as: IDENT := [Letter _] {( Letter | Digit | _ )} Letter := [a-z A-Z] Digit := [0-9] 2. The language variables are either numeric scalar variables or string scalar variables. Numeric variables start by a “$” and followed by an IDENT. While a string variable starts by “@” and followed by an IDENT. Their definitions are as follows: NIDENT := $ IDENT SIDENT := @ IDENT 3. Integer constant is referred to by ICONST terminal, which is defined as one or more digits. It is defined as: ICONST := [0-9]+ 4. Real constant is a fixed-point real number referred to by RCONST terminal, which is defined as one or more digits followed by a decimal point (dot) and zero or more digits. It is defined as: RCONST := ([0-9]+)\.([0-9]*) For example, real number constants such as 12.0, and 0.2, 2. are accepted as real constants, but .2, and 2.45.2 are not. Note that “.2” is recognized as a dot (CAT operator) followed by the integer constant 2. 5. String literals is referred to by SCONST terminal, which is defined as a sequence of characters delimited by single quotes, that should all appear on the same line. For example, ‘Hello to CS 280.’ is a string literal. While, “Hello to CS 280.” Or ‘Hello to CS 280.” are not. 6. The reserved words of the language are: writeln, if, else. These reserved words have the following tokens, respectively: WRITELN, IF, ELSE. 7. The operators of the language are: +, -, *, /, ^, =, (, ), {, }, ==, >, <, .="" (dot),="" **="" (repeat),="" -eq,="" -="" lt,="" and="" -gt.="" these="" operators="" are="" for="" add,="" subtract,="" multiply,="" divide,="" exponent,="" assignment,="" left="" parenthesis,="" right="" parenthesis,="" numeric="" equality,="" numeric="" greater="" than,="" numeric="" less="" than,="" string="" concatenation,="" string="" repetition,="" string="" equality,="" string="" less-than,="" and="" string="" greater-than="" operations,="" respectively.="" they="" have="" the="" following="" tokens,="" respectively:="" plus,="" minus,="" mult,="" div,="" exponent,="" assop,="" neq,="" ngthan,="" nlthan,="" cat,="" srepeat,="" seq,="" slthan,="" and="" sgthan.="" note="" that="" the="" string="" comparison="" operators="" -eq,="" -lt,="" and="" -gt="" are="" not="" case="" sensitive.="" 8.="" the="" semicolon,="" comma,="" left="" parenthesis,="" right="" parenthesis,="" left="" braces,="" and="" right="" braces="" characters="" are="" terminals="" with="" the="" following="" tokens:="" semicol="" and="" comma,="" lparen,="" rparen,="" lbraces,="" and="" rbraces,="" respectively.="" 9.="" a="" comment="" is="" defined="" by="" all="" the="" characters="" following="" the="" characters="" “#”="" to="" the="" end="" of="" line.="" a="" recognized="" comment="" is="" skipped="" and="" does="" not="" have="" a="" token.="" 10.="" white="" spaces="" are="" skipped.="" however,="" white="" spaces="" between="" tokens="" are="" used="" to="" improve="" readability="" and="" can="" be="" used="" as="" a="" one="" way="" to="" delimit="" tokens.="" 11.="" an="" error="" will="" be="" denoted="" by="" the="" err="" token.="" 12.="" end="" of="" file="" will="" be="" denoted="" by="" the="" done="" token.="" lexical="" analyzer="" requirements:="" a="" header="" file,="" lex.h,="" is="" provided="" for="" you.="" it="" contains="" the="" definitions="" of="" the="" lexitem="" class,="" and="" an="" enumerated="" type="" of="" token="" symbols,="" called="" token,="" and="" the="" definitions="" of="" three="" functions="" to="" be="" implemented.="" these="" are:="" extern="" ostream&=""><(ostream& out,="" const="" lexitem&="" tok);="" extern="" lexitem="" id_or_kw(const="" string&="" lexeme,="" int="" linenum);="" extern="" lexitem="" getnexttoken(istream&="" in,="" int&="" linenum);="" you="" must="" use="" the="" header="" file="" that="" is="" provided.="" you="" may="" not="" change="" it.="" i.="" you="" will="" write="" the="" lexical="" analyzer="" function,="" called="" getnexttoken,="" in="" the="" file="" “lex.cpp”.="" the="" getnexttoken="" function="" must="" have="" the="" following="" signature:="" lexitem="" getnexttoken="" (istream&="" in,="" int&="" linenumber);="" the="" first="" argument="" to="" getnexttoken="" is="" a="" reference="" to="" an="" istream="" object="" that="" the="" function="" should="" read="" from.="" the="" second="" argument="" to="" getnexttoken="" is="" a="" reference="" to="" an="" integer="" that="" contains="" the="" current="" line="" number.="" getnexttoken="" should="" update="" this="" integer="" every="" time="" it="" reads="" a="" newline="" from="" the="" input="" stream.="" getnexttoken="" returns="" a="" lexitem="" object.="" a="" lexitem="" is="" a="" class="" that="" contains="" a="" token,="" a="" string="" for="" the="" lexeme,="" and="" the="" line="" number="" as="" data="" members.="" note="" that="" the="" getnexttoken="" function="" performs="" the="" following:="" 1.="" any="" error="" detected="" by="" the="" lexical="" analyzer="" should="" result="" in="" a="" lexitem="" object="" to="" be="" returned="" with="" the="" err="" token,="" and="" the="" lexeme="" value="" equal="" to="" the="" string="" recognized="" when="" the="" error="" was="" detected.="" 2.="" note="" also="" that="" both="" err="" and="" done="" are="" unrecoverable.="" once="" the="" getnexttoken="" function="" returns="" a="" lexitem="" object="" for="" either="" of="" these="" tokens,="" you="" shouldn’t="" call="" getnexttoken="" again.="" 3.="" tokens="" may="" be="" separated="" by="" spaces,="" but="" in="" most="" cases="" are="" not="" required="" to="" be.="" for="" example,="" the="" input="" characters="" “3+7”="" and="" the="" input="" characters="" “3="" +="" 7”="" will="" both="" result="" in="" the="" sequence="" of="" tokens="" iconst="" plus="" iconst.="" similarly,="" the="" input="" characters="" ‘hello’="" ‘world’,="" and="" the="" input="" characters="" ‘hello’’world’="" will="" both="" result="" in="" the="" token="" sequence="" sconst="" sconst.="" ii.="" you="" will="" implement="" the="" id_or_kw()="" function.="" id_or_kw="" function="" accepts="" a="" reference="" to="" a="" string="" of="" a="" general="" identifier="" lexeme="" (i.e.,="" keyword,="" ident,="" sident,="" or="" nident)="" and="" a="" line="" number="" and="" returns="" a="" lexitem="" object.="" it="" searches="" for="" the="" lexeme="" in="" a="" directory="" that="" maps="" a="" string="" value="" of="" a="" keyword="" to="" its="" corresponding="" token="" value,="" and="" it="" returns="" a="" lexitem="" object="" containing="" the="" keyword="" token="" if="" it="" is="" found.="" otherwise,="" it="" returns="" a="" lexitem="" object="" containing="" a="" token="" for="" one="" of="" the="" possible="" types="" of="" identifiers="" (i.e.,="" ident,="" sident,="" or="" nident).="" iii.="" you="" will="" implement="" the="" overloaded="" function=""><. the="">< function="" accepts="" a="" reference="" to="" an="" ostream="" object="" and="" a="" reference="" to="" a="" lexitem="" object,="" and="" returns="" a="" reference="" to="" the="" ostream="" object.="" the="">< function should print out the string value of the token in the tok object. if the token is either an ident, nident, sident, iconst, rconst, sconst, it will print out its token followed by its lexeme between parentheses. see the example in the slides. testing program requirements: it is recommended to implement the lexical analyzer in one source file, and the main test program in another source file. the testing program is a main() function that takes several command line flags. the notations for input flags are as follows: ● -v (optional): if present, every token is printed out when it is seen followed by its lexeme between parentheses. ● -nconst (optional): if present, prints out all the unique numeric constants (i.e., integer or real) in numeric order. ● -sconst (optional): if present, prints out all the unique string constants in alphabetical order ● -ident (optional): if present, prints out all of the unique identifiers in alphabetical order. ● filename argument must be passed to main function. your program should open the file and read from that filename. note, your testing program should apply the following rules: 1. the flag arguments (arguments that begin with a dash) may appear in any order, and may appear multiple times. only the last appearance of the same flag is considered. 2. there can be at most one file name specified on the command line. if more than one filename is provided, the program should print on a new line the message “only one file name is allowed.” and it should stop running. if no file name is provided, the program should print on a new line the message “no specified input file.”. then the program should stop running. 3. if an unrecognized flag is present, the program should print on a new line the message “unrecognized flag {arg}”, where {arg} is whatever flag was given. then the program should stop running. 4. if the program cannot open a filename that is given, the program should print on a new line the message “cannot open the file arg”, where arg is the filename given. then the program should stop running. 5. if getnexttoken function returns err, the program should print “error in line n ({lexeme})”, where n is the line number of the token in the input file and lexeme is its corresponding lexeme, and then it should stop running. for example, a file that contains an invalid real constant, as .15, in line 1 of the file, the program should print the message: error in line 1 (.15) 6. the program should repeatedly call getnexttoken until it returns done or err. if it returns done, the program prints the list of all tokens if the “-v” is specified, followed by the summary information, then handles the flags “-idents”, “-nconst”, and “-sconst” in this order. the summary information are as follows: lines: l total tokens: m identifiers: n numbers: o strings: p where l is the number of input lines, m is the number of tokens (not counting done), n is the number of identifiers tokens (e.g., ident, nident, and sident), o is the number of numeric constants, and p is the number of string literals. if the file is empty the value of l is zero, and the following output message is displayed. lines: 0 empty file. 7. if the -v option is present, the program should print each token as it is read and recognized, one token per line. the output format for the token is the token name in all capital letters (for example, the token lparen should be printed out as the string lparen. in the case of the tokens ident, and nconst the token name should be followed by a space and the lexeme in parentheses. in case of the token sconst, the token name should be followed by a space and the lexeme between single quotes (‘’). for example, if an identifier “$circle” and a string literal ‘the center of the circle through these points is’ are recognized, the -v output for them would be: nident ($circle) sconst ‘the center of the circle through these points is) 8. the -sconst option should cause the program to print the label strings: on a line by itself, followed by every unique string constant found, one string per line with single quotes, in alphabetical order. if there are no sconsts in the input, then nothing is printed. 9. the -nconsts option should cause the program to print the label numbers: on a line by itself, followed by every unique numeric constant found, one number per line, in numeric order. if function="" should="" print="" out="" the="" string="" value="" of="" the="" token="" in="" the="" tok="" object.="" if="" the="" token="" is="" either="" an="" ident,="" nident,="" sident,="" iconst,="" rconst,="" sconst,="" it="" will="" print="" out="" its="" token="" followed="" by="" its="" lexeme="" between="" parentheses.="" see="" the="" example="" in="" the="" slides.="" testing="" program="" requirements:="" it="" is="" recommended="" to="" implement="" the="" lexical="" analyzer="" in="" one="" source="" file,="" and="" the="" main="" test="" program="" in="" another="" source="" file.="" the="" testing="" program="" is="" a="" main()="" function="" that="" takes="" several="" command="" line="" flags.="" the="" notations="" for="" input="" flags="" are="" as="" follows:="" ●="" -v="" (optional):="" if="" present,="" every="" token="" is="" printed="" out="" when="" it="" is="" seen="" followed="" by="" its="" lexeme="" between="" parentheses.="" ●="" -nconst="" (optional):="" if="" present,="" prints="" out="" all="" the="" unique="" numeric="" constants="" (i.e.,="" integer="" or="" real)="" in="" numeric="" order.="" ●="" -sconst="" (optional):="" if="" present,="" prints="" out="" all="" the="" unique="" string="" constants="" in="" alphabetical="" order="" ●="" -ident="" (optional):="" if="" present,="" prints="" out="" all="" of="" the="" unique="" identifiers="" in="" alphabetical="" order.="" ●="" filename="" argument="" must="" be="" passed="" to="" main="" function.="" your="" program="" should="" open="" the="" file="" and="" read="" from="" that="" filename.="" note,="" your="" testing="" program="" should="" apply="" the="" following="" rules:="" 1.="" the="" flag="" arguments="" (arguments="" that="" begin="" with="" a="" dash)="" may="" appear="" in="" any="" order,="" and="" may="" appear="" multiple="" times.="" only="" the="" last="" appearance="" of="" the="" same="" flag="" is="" considered.="" 2.="" there="" can="" be="" at="" most="" one="" file="" name="" specified="" on="" the="" command="" line.="" if="" more="" than="" one="" filename="" is="" provided,="" the="" program="" should="" print="" on="" a="" new="" line="" the="" message="" “only="" one="" file="" name="" is="" allowed.”="" and="" it="" should="" stop="" running.="" if="" no="" file="" name="" is="" provided,="" the="" program="" should="" print="" on="" a="" new="" line="" the="" message="" “no="" specified="" input="" file.”.="" then="" the="" program="" should="" stop="" running.="" 3.="" if="" an="" unrecognized="" flag="" is="" present,="" the="" program="" should="" print="" on="" a="" new="" line="" the="" message="" “unrecognized="" flag="" {arg}”,="" where="" {arg}="" is="" whatever="" flag="" was="" given.="" then="" the="" program="" should="" stop="" running.="" 4.="" if="" the="" program="" cannot="" open="" a="" filename="" that="" is="" given,="" the="" program="" should="" print="" on="" a="" new="" line="" the="" message="" “cannot="" open="" the="" file="" arg”,="" where="" arg="" is="" the="" filename="" given.="" then="" the="" program="" should="" stop="" running.="" 5.="" if="" getnexttoken="" function="" returns="" err,="" the="" program="" should="" print="" “error="" in="" line="" n="" ({lexeme})”,="" where="" n="" is="" the="" line="" number="" of="" the="" token="" in="" the="" input="" file="" and="" lexeme="" is="" its="" corresponding="" lexeme,="" and="" then="" it="" should="" stop="" running.="" for="" example,="" a="" file="" that="" contains="" an="" invalid="" real="" constant,="" as="" .15,="" in="" line="" 1="" of="" the="" file,="" the="" program="" should="" print="" the="" message:="" error="" in="" line="" 1="" (.15)="" 6.="" the="" program="" should="" repeatedly="" call="" getnexttoken="" until="" it="" returns="" done="" or="" err.="" if="" it="" returns="" done,="" the="" program="" prints="" the="" list="" of="" all="" tokens="" if="" the="" “-v”="" is="" specified,="" followed="" by="" the="" summary="" information,="" then="" handles="" the="" flags="" “-idents”,="" “-nconst”,="" and="" “-sconst”="" in="" this="" order.="" the="" summary="" information="" are="" as="" follows:="" lines:="" l="" total="" tokens:="" m="" identifiers:="" n="" numbers:="" o="" strings:="" p="" where="" l="" is="" the="" number="" of="" input="" lines,="" m="" is="" the="" number="" of="" tokens="" (not="" counting="" done),="" n="" is="" the="" number="" of="" identifiers="" tokens="" (e.g.,="" ident,="" nident,="" and="" sident),="" o="" is="" the="" number="" of="" numeric="" constants,="" and="" p="" is="" the="" number="" of="" string="" literals.="" if="" the="" file="" is="" empty="" the="" value="" of="" l="" is="" zero,="" and="" the="" following="" output="" message="" is="" displayed.="" lines:="" 0="" empty="" file.="" 7.="" if="" the="" -v="" option="" is="" present,="" the="" program="" should="" print="" each="" token="" as="" it="" is="" read="" and="" recognized,="" one="" token="" per="" line.="" the="" output="" format="" for="" the="" token="" is="" the="" token="" name="" in="" all="" capital="" letters="" (for="" example,="" the="" token="" lparen="" should="" be="" printed="" out="" as="" the="" string="" lparen.="" in="" the="" case="" of="" the="" tokens="" ident,="" and="" nconst="" the="" token="" name="" should="" be="" followed="" by="" a="" space="" and="" the="" lexeme="" in="" parentheses.="" in="" case="" of="" the="" token="" sconst,="" the="" token="" name="" should="" be="" followed="" by="" a="" space="" and="" the="" lexeme="" between="" single="" quotes="" (‘’).="" for="" example,="" if="" an="" identifier="" “$circle”="" and="" a="" string="" literal="" ‘the="" center="" of="" the="" circle="" through="" these="" points="" is’="" are="" recognized,="" the="" -v="" output="" for="" them="" would="" be:="" nident="" ($circle)="" sconst="" ‘the="" center="" of="" the="" circle="" through="" these="" points="" is)="" 8.="" the="" -sconst="" option="" should="" cause="" the="" program="" to="" print="" the="" label="" strings:="" on="" a="" line="" by="" itself,="" followed="" by="" every="" unique="" string="" constant="" found,="" one="" string="" per="" line="" with="" single="" quotes,="" in="" alphabetical="" order.="" if="" there="" are="" no="" sconsts="" in="" the="" input,="" then="" nothing="" is="" printed.="" 9.="" the="" -nconsts="" option="" should="" cause="" the="" program="" to="" print="" the="" label="" numbers:="" on="" a="" line="" by="" itself,="" followed="" by="" every="" unique="" numeric="" constant="" found,="" one="" number="" per="" line,="" in="" numeric="" order.="">
Mar 06, 2023
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here