C++ Programming assigment
CS 280 Fall 2020 Programming Assignment 1 February 22, 2021 Due Date: Wednesday, March 10, 2021, 23:59 Total Points: 20 In this programming assignment, you will be building a lexical analyzer for small programming language and a program to test it. This assignment will be followed by two other assignments to build a parser and interpreter to the same language. Although, we are not concerned about the syntax definitions of the language in this assignment, we intend to introduce it ahead of Programming Assignment 2 in order to show the language reserved words, constants, and operators. The syntax definitions of a Fortran-Like small programming language are given below using EBNF notations. The details of the meanings (i.e. semantics) of the language constructs will be given later on. Prog = PROGRAM IDENT {Decl} {Stmt} END PROGRAM IDENT Decl = Type : VarList Type = INTEGER | REAL |CHAR VarList = Var {,Var} Stmt = AssigStmt | IfStmt | PrintStmt |ReadStmt PrintStmt := PRINT , ExprList IfStmt = IF (LogicExpr) THEN {Stmt} END IF AssignStmt = Var = Expr ReadStmt = READ , VarList ExprList = Expr {, Expr} Expr = Term {(+|-) Term} Term = SFactor {(*|/) SFactor} SFactor = Sign Factor | Factor LogicExpr = Expr (== | <) expr="" var="IDENT" sign="+" |="" -="" factor="IDENT" |="" iconst="" |="" rconst="" |="" sconst="" |="" (expr)="" based="" on="" the="" language="" definitions,="" the="" lexical="" rules="" of="" the="" language="" and="" the="" assigned="" tokens="" to="" terminals="" are="" as="" follows:="" 1.="" the="" language="" has="" identifiers,="" referred="" to="" by="" ident="" terminal,="" which="" are="" defined="" to="" be="" a="" letter="" followed="" by="" zero="" or="" more="" letters="" or="" digit.="" it="" is="" defined="" as:="" ident="" :="Letter" {(letter|digit)}="" letter="" :="[a-z" a-z]="" digit="" :="[0-9]" the="" token="" for="" an="" identifier="" is="" ident.="" 2.="" integer="" constants,="" referred="" to="" by="" iconst="" terminal,="" are="" defined="" as="" one="" or="" more="" digits.="" it="" is="" defined="" as:="" iconst="" :="[0-9]+" the="" token="" for="" an="" integer="" constant="" is="" iconst.="" 3.="" real="" constants,="" referred="" to="" by="" rconst="" terminal,="" are="" defined="" as="" zero="" or="" more="" digits="" followed="" by="" a="" decimal="" point="" (dot)="" and="" one="" or="" more="" digits.="" it="" is="" defined="" as:="" rconst="" :="([0-9]*)\.([0-9]+)" the="" token="" for="" a="" real="" constant="" is="" rconst.="" for="" example,="" real="" number="" constants="" such="" as="" 12.0="" and="" .2="" are="" accepted,="" but="" 2.="" is="" not.="" 4.="" string="" literals,="" referred="" to="" by="" sconst="" terminal,="" are="" defined="" as="" a="" sequence="" of="" characters="" delimited="" by="" single="" or="" double="" quotes,="" that="" should="" all="" appear="" on="" the="" same="" line.="" the="" assigned="" token="" for="" a="" string="" constant="" is="" sconst.="" for="" example,="" “hello="" to="" cs="" 280.”="" or="" ‘hello="" to="" cs="" 280.’="" are="" string="" literals.="" there="" are="" no="" escape="" characters.="" however,="" a="" string="" delimited="" by="" single="" quotes="" can="" have="" double="" quotes="" character="" as="" one="" of="" the="" characters="" of="" the="" string,="" and="" similarly="" a="" string="" delimited="" by="" double="" quotes="" characters="" can="" have="" a="" single="" quote="" as="" one="" of="" the="" characters="" of="" the="" string.="" for="" example,="" “welcome="" to="" smith’s="" home”="" or="" ‘welcome="" to="" “smith”="" home’="" are="" acceptable="" strings.="" 5.="" the="" reserved="" words="" of="" the="" language="" are:="" program,="" end,="" print,="" read,="" if,="" then,="" integer,="" real,="" char.="" these="" reserved="" words="" have="" the="" following="" tokens,="" respectively:="" program,="" end,="" print,="" read,="" if,="" then,="" integer,="" real,="" and="" char.="" 6.="" the="" operators="" of="" the="" language="" are:="" +,="" -,="" *,="" ,="" ,="," (,="" ),="=,">)><. these operators are for add, subract, multiply, divide, concatenate, assignment, left parenthesis, right parenthesis, equality, and less than operations. they have the following tokens, respectively: plus, minus, mult, div, concat, assop, lparen, rparen, equal, lthan. 7. the colon and comma characters are terminals with the following tokens: colon, coma. 8. a comment is defined by all the characters following the exclamation mark “!” to the end of the line. a comment does not overlap one line. a recognized comment is ignored and does not have a token. 9. white spaces are skipped. however, white spaces between tokens are used to improve readability and can be used as a one way to delimit tokens. 10. an error will be denoted by the err token. 11. end of file will be denoted by the done token. lexical analyzer requirements: you will write a lexical analyzer function, called getnexttoken, and a driver program for testing it. the getnexttoken function must have the following signature: lexitem getnexttoken (istream& in, int& linenumber); the first argument to getnexttoken is a reference to an istream object that the function should read from. the second argument to getnexttoken is a reference to an integer that contains the current line number. getnexttoken should update this integer every time it reads a newline from the input stream. getnexttoken returns a lexitem object. a lexitem is a class that contains a token, a string for the lexeme, and the line number as data members. a header file, lex.h, is provided for you. it contains a definition of the lexitem class, and a definition of an enumerated type of token symbols, called token. you must use the header file that is provided. you may not change it. note that the getnexttoken function performs the following: 1. any error detected by the lexical analyzer should result in a lexitem object to be returned with the err token, and the lexeme value equal to the string recognized when the error was detected. 2. note also that both err and done are unrecoverable. once the getnexttoken function returns a lexitem object for either of these tokens, you shouldn’t call getnexttoken again. 3. tokens may be separated by spaces, but in most cases are not required to be. for example, the input characters “3+7” and the input characters “3 + 7” will both result in the sequence of tokens iconst plus iconst. similarly, the input characters “hello” “world”, and the input characters “hello””world” will both result in the token sequence sconst sconst. testing program requirements: it is recommended to implement the lexical analyzer in one source file, and the main test program in another source file. the testing program is a main() function that takes several command line flags. the notations for each input flag are as follows: ● -v (optional): if present, every token is printed when it is seen followed by its lexeme between parentheses. ● -iconsts (optional): if present, prints out all the unique integer constants in numeric order. ● -rconsts (optional): if present, prints out all the unique real constants in numeric order. ● -sconsts (optional): if present, prints out all the unique string constants in alphabetical order ● -ids (optional): if present, prints out all of the unique identifiers in alphabetical order. ● filename argument must be passed to main function. your program should open the file and read from that filename. note, your testing program should apply the following rules: 1. the flag arguments (arguments that begin with a dash) may appear in any order, and may appear multiple times. only the last one is considered. 2. there can be at most one file name specified on the command line. if more than one filename is provided, the program should print on a new line the message “only one file name allowed” and it should stop running. if no file name is provided, the program should print on a new line the message “no specified input file name found”, and should stop running. 3. no other flags are permitted. if an unrecognized flag is present, the program should print on a new line the message “unrecognized flag {arg}”, where {arg} is whatever flag was given, and it should stop running. 4. if the program cannot open a filename that is given, the program should print on a new line the message “cannot open the file {arg}”, where {arg} is the filename given, and it should stop running. 5. if getnexttoken function returns err, the program should print “error in line n ({lexeme})”, where n is the line number of the token in the input file and lexeme is its corresponding lexeme, and then it should stop running. for example, a file that contains an invalid real constant, as 15., in line 1 of the file, the program should print the message: error in line 1 (15) 6. the program should repeatedly call getnexttoken until it returns done or err. if it returns done, the program prints summary information, then handles the flags -sconsts, - iconsts, rconsts and -ids, in that order. the summary information are as follows: lines: l tokens: n where l is the number of input lines and n is the number of tokens (not counting done). if l is zero, no further lines are printed. 7. if the -v option is present, the program should print each token as it is read and recognized, one token per line. the output format for the token is the token name in all capital letters (for example, the token lparen should be printed out as the string lparen. in the case of the tokens ident, iconst, rconst, and sconst, the token name should be followed by a space and the lexeme in parentheses. for example, if the identifier “circle” and a string literal "the center of the circle through these points is" are recognized, the -v output for them would be: ident (circle) sconst (the center of the circle through these points is) 8. the -sconsts option should cause the program to print the label strings: on a line by itself, followed by every unique string constant found, one string per line without double quotes, in alphabetical order. if there are no sconsts in the input, then nothing is printed. 9. the -iconsts option should cause the program to print the label these="" operators="" are="" for="" add,="" subract,="" multiply,="" divide,="" concatenate,="" assignment,="" left="" parenthesis,="" right="" parenthesis,="" equality,="" and="" less="" than="" operations.="" they="" have="" the="" following="" tokens,="" respectively:="" plus,="" minus,="" mult,="" div,="" concat,="" assop,="" lparen,="" rparen,="" equal,="" lthan.="" 7.="" the="" colon="" and="" comma="" characters="" are="" terminals="" with="" the="" following="" tokens:="" colon,="" coma.="" 8.="" a="" comment="" is="" defined="" by="" all="" the="" characters="" following="" the="" exclamation="" mark="" “!”="" to="" the="" end="" of="" the="" line.="" a="" comment="" does="" not="" overlap="" one="" line.="" a="" recognized="" comment="" is="" ignored="" and="" does="" not="" have="" a="" token.="" 9.="" white="" spaces="" are="" skipped.="" however,="" white="" spaces="" between="" tokens="" are="" used="" to="" improve="" readability="" and="" can="" be="" used="" as="" a="" one="" way="" to="" delimit="" tokens.="" 10.="" an="" error="" will="" be="" denoted="" by="" the="" err="" token.="" 11.="" end="" of="" file="" will="" be="" denoted="" by="" the="" done="" token.="" lexical="" analyzer="" requirements:="" you="" will="" write="" a="" lexical="" analyzer="" function,="" called="" getnexttoken,="" and="" a="" driver="" program="" for="" testing="" it.="" the="" getnexttoken="" function="" must="" have="" the="" following="" signature:="" lexitem="" getnexttoken="" (istream&="" in,="" int&="" linenumber);="" the="" first="" argument="" to="" getnexttoken="" is="" a="" reference="" to="" an="" istream="" object="" that="" the="" function="" should="" read="" from.="" the="" second="" argument="" to="" getnexttoken="" is="" a="" reference="" to="" an="" integer="" that="" contains="" the="" current="" line="" number.="" getnexttoken="" should="" update="" this="" integer="" every="" time="" it="" reads="" a="" newline="" from="" the="" input="" stream.="" getnexttoken="" returns="" a="" lexitem="" object.="" a="" lexitem="" is="" a="" class="" that="" contains="" a="" token,="" a="" string="" for="" the="" lexeme,="" and="" the="" line="" number="" as="" data="" members.="" a="" header="" file,="" lex.h,="" is="" provided="" for="" you.="" it="" contains="" a="" definition="" of="" the="" lexitem="" class,="" and="" a="" definition="" of="" an="" enumerated="" type="" of="" token="" symbols,="" called="" token.="" you="" must="" use="" the="" header="" file="" that="" is="" provided.="" you="" may="" not="" change="" it.="" note="" that="" the="" getnexttoken="" function="" performs="" the="" following:="" 1.="" any="" error="" detected="" by="" the="" lexical="" analyzer="" should="" result="" in="" a="" lexitem="" object="" to="" be="" returned="" with="" the="" err="" token,="" and="" the="" lexeme="" value="" equal="" to="" the="" string="" recognized="" when="" the="" error="" was="" detected.="" 2.="" note="" also="" that="" both="" err="" and="" done="" are="" unrecoverable.="" once="" the="" getnexttoken="" function="" returns="" a="" lexitem="" object="" for="" either="" of="" these="" tokens,="" you="" shouldn’t="" call="" getnexttoken="" again.="" 3.="" tokens="" may="" be="" separated="" by="" spaces,="" but="" in="" most="" cases="" are="" not="" required="" to="" be.="" for="" example,="" the="" input="" characters="" “3+7”="" and="" the="" input="" characters="" “3="" +="" 7”="" will="" both="" result="" in="" the="" sequence="" of="" tokens="" iconst="" plus="" iconst.="" similarly,="" the="" input="" characters="" “hello”="" “world”,="" and="" the="" input="" characters="" “hello””world”="" will="" both="" result="" in="" the="" token="" sequence="" sconst="" sconst.="" testing="" program="" requirements:="" it="" is="" recommended="" to="" implement="" the="" lexical="" analyzer="" in="" one="" source="" file,="" and="" the="" main="" test="" program="" in="" another="" source="" file.="" the="" testing="" program="" is="" a="" main()="" function="" that="" takes="" several="" command="" line="" flags.="" the="" notations="" for="" each="" input="" flag="" are="" as="" follows:="" ●="" -v="" (optional):="" if="" present,="" every="" token="" is="" printed="" when="" it="" is="" seen="" followed="" by="" its="" lexeme="" between="" parentheses.="" ●="" -iconsts="" (optional):="" if="" present,="" prints="" out="" all="" the="" unique="" integer="" constants="" in="" numeric="" order.="" ●="" -rconsts="" (optional):="" if="" present,="" prints="" out="" all="" the="" unique="" real="" constants="" in="" numeric="" order.="" ●="" -sconsts="" (optional):="" if="" present,="" prints="" out="" all="" the="" unique="" string="" constants="" in="" alphabetical="" order="" ●="" -ids="" (optional):="" if="" present,="" prints="" out="" all="" of="" the="" unique="" identifiers="" in="" alphabetical="" order.="" ●="" filename="" argument="" must="" be="" passed="" to="" main="" function.="" your="" program="" should="" open="" the="" file="" and="" read="" from="" that="" filename.="" note,="" your="" testing="" program="" should="" apply="" the="" following="" rules:="" 1.="" the="" flag="" arguments="" (arguments="" that="" begin="" with="" a="" dash)="" may="" appear="" in="" any="" order,="" and="" may="" appear="" multiple="" times.="" only="" the="" last="" one="" is="" considered.="" 2.="" there="" can="" be="" at="" most="" one="" file="" name="" specified="" on="" the="" command="" line.="" if="" more="" than="" one="" filename="" is="" provided,="" the="" program="" should="" print="" on="" a="" new="" line="" the="" message="" “only="" one="" file="" name="" allowed”="" and="" it="" should="" stop="" running.="" if="" no="" file="" name="" is="" provided,="" the="" program="" should="" print="" on="" a="" new="" line="" the="" message="" “no="" specified="" input="" file="" name="" found”,="" and="" should="" stop="" running.="" 3.="" no="" other="" flags="" are="" permitted.="" if="" an="" unrecognized="" flag="" is="" present,="" the="" program="" should="" print="" on="" a="" new="" line="" the="" message="" “unrecognized="" flag="" {arg}”,="" where="" {arg}="" is="" whatever="" flag="" was="" given,="" and="" it="" should="" stop="" running.="" 4.="" if="" the="" program="" cannot="" open="" a="" filename="" that="" is="" given,="" the="" program="" should="" print="" on="" a="" new="" line="" the="" message="" “cannot="" open="" the="" file="" {arg}”,="" where="" {arg}="" is="" the="" filename="" given,="" and="" it="" should="" stop="" running.="" 5.="" if="" getnexttoken="" function="" returns="" err,="" the="" program="" should="" print="" “error="" in="" line="" n="" ({lexeme})”,="" where="" n="" is="" the="" line="" number="" of="" the="" token="" in="" the="" input="" file="" and="" lexeme="" is="" its="" corresponding="" lexeme,="" and="" then="" it="" should="" stop="" running.="" for="" example,="" a="" file="" that="" contains="" an="" invalid="" real="" constant,="" as="" 15.,="" in="" line="" 1="" of="" the="" file,="" the="" program="" should="" print="" the="" message:="" error="" in="" line="" 1="" (15)="" 6.="" the="" program="" should="" repeatedly="" call="" getnexttoken="" until="" it="" returns="" done="" or="" err.="" if="" it="" returns="" done,="" the="" program="" prints="" summary="" information,="" then="" handles="" the="" flags="" -sconsts,="" -="" iconsts,="" rconsts="" and="" -ids,="" in="" that="" order.="" the="" summary="" information="" are="" as="" follows:="" lines:="" l="" tokens:="" n="" where="" l="" is="" the="" number="" of="" input="" lines="" and="" n="" is="" the="" number="" of="" tokens="" (not="" counting="" done).="" if="" l="" is="" zero,="" no="" further="" lines="" are="" printed.="" 7.="" if="" the="" -v="" option="" is="" present,="" the="" program="" should="" print="" each="" token="" as="" it="" is="" read="" and="" recognized,="" one="" token="" per="" line.="" the="" output="" format="" for="" the="" token="" is="" the="" token="" name="" in="" all="" capital="" letters="" (for="" example,="" the="" token="" lparen="" should="" be="" printed="" out="" as="" the="" string="" lparen.="" in="" the="" case="" of="" the="" tokens="" ident,="" iconst,="" rconst,="" and="" sconst,="" the="" token="" name="" should="" be="" followed="" by="" a="" space="" and="" the="" lexeme="" in="" parentheses.="" for="" example,="" if="" the="" identifier="" “circle”="" and="" a="" string="" literal="" "the="" center="" of="" the="" circle="" through="" these="" points="" is"="" are="" recognized,="" the="" -v="" output="" for="" them="" would="" be:="" ident="" (circle)="" sconst="" (the="" center="" of="" the="" circle="" through="" these="" points="" is)="" 8.="" the="" -sconsts="" option="" should="" cause="" the="" program="" to="" print="" the="" label="" strings:="" on="" a="" line="" by="" itself,="" followed="" by="" every="" unique="" string="" constant="" found,="" one="" string="" per="" line="" without="" double="" quotes,="" in="" alphabetical="" order.="" if="" there="" are="" no="" sconsts="" in="" the="" input,="" then="" nothing="" is="" printed.="" 9.="" the="" -iconsts="" option="" should="" cause="" the="" program="" to="" print="" the="">