Ask me if you have confusion
HW 2/Homework #2 - Flex.pdf HOMEWORK #2: Lexical Analyzer using Flex Due Date: Friday, March 19th, 11:59.59pm Description: For this assignment, you will write a lexical analyzer using flex , in order to recognize a variety of token types. Your program should output information about each lexeme it encounters. Tokens: Your lexical analyzer should recognize the following tokens from the PUCK-21.1 language: ● Integers (INTCONST) are non-empty sequences of digits optionally preceded with either a ‘ +’ or ‘ -’ sign. (e.g. 3, -12 , +001, -1230 ). ● Decimal (DECCONST) numbers are Integers followed by a period ‘.’, followed by a non-empty sequence of digits. (e.g. 3.14, 00.01, 123.0). ● Scientific (SCICONST) numbers are Decimal numbers followed by the character ‘ E’, followed by a non-zero integer. (e.g. 12.0E4, 1.23E-6 ). ● Hexadecimal (HEXCONST) numbers are non-empty sequences of hexadecimal digits (i.e. decimal digits or the characters ‘ A’, ‘B’, ‘C’, ‘D’, ‘E’ or ‘F’ ) followed by the suffix ‘ H’. (e.g. 12AD0H, 123H, 1A2B3CH ). ● Binary (BINCONST) are non-empty sequences of digits ‘0’, and ‘1’ followed by the suffix ‘ B’. (e.g. 10110B, 101B, 001100B, ). ● Keywords (KEYWORD) are specific strings that form the language. For this homework we will consider the following keywords: ‘WHILE’, ‘ELSE’, ‘IF’, and ‘ PRINT’. ● String literals (STRCONST) are sequences of non-whitespace characters enclosed in double quote marks (' " '). (e.g. "555.ABC.#$%!", "ProgroTron", "Be-Happy!" ). Note : spaces are not allowed on string literals. ● Character literals (CHCONST) are two hexadecimal digits followed by the suffix 'X'. (e.g. 12X, AFX, FFX ). ● Identifiers (IDENT) are strings that consist of a letter followed by zero or more letters, digits or the underscore; and that are not keywords, nor hexadecimal numbers nor character literals (e.g. x, size, name, p3, r_val ). ● Operators, (OPERATOR) the symbols ‘ +’, ‘ -’, ‘ *’, ‘ /’, ‘<’, ‘="">’, '=', '&' and ‘#’. Your lexical analyzer should also identify and ignore comments, which start with the character ‘ %’ and run to the end of the line. Your lexical analyzer should also keep track of the number of lines processed. Submission: Submit through the UNIX systems using the command: cssubmit 3500 102 2 Put your name in your source file. Submit a single file called ‘mylexer.l’. Your file will be compiled, run and tested using the following chain of commands: flex mylexer.l g++ lex.yy.c -lfl -o lexer.ex ./lexer.ex < inputfilename ="" output: ="" the="" output="" of="" your="" lexical="" analyzer="" should="" match="" the="" sample="" output,="" although ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" ="" whitespace="" will="" be="" ignored="" when="" grading. ="" sample="" input="" and="" output: ="" ="" ="" input="" print="" some="" while="" input="" +="" -1234="" %what="" about="" this?="" */-="" 0123="" -99="" +="" x="" camelcase="" &&^="" 44.4e3321="" %%%="" yet="" another="" comment="" while="" if="" flex="" proc="" 203.978="" -22.4="" +="" "30x2"="" '="" !="" abch="" fff="" 123.456="" %%="" here="" be="" dragons.="" 1+2="3+4">t 00B 1010101 B "a bc" 12X banana 5 &@ 12.53E231 2B or not toBE1 111B 78E / -42.. "another_str_constant" Output #0: TOKEN: KEYWORD LEXEME: PRINT #1: TOKEN: IDENT LEXEME: some #2: TOKEN: IDENT LEXEME: while #3: TOKEN: IDENT LEXEME: input #4: TOKEN: OPERATOR LEXEME: + #5: TOKEN: INTCONST LEXEME: -1234 #6: TOKEN: OPERATOR LEXEME: * #7: TOKEN: OPERATOR LEXEME: / #8: TOKEN: OPERATOR LEXEME: - #9: TOKEN: INTCONST LEXEME: 0123 #10: TOKEN: INTCONST LEXEME: -99 #11: TOKEN: OPERATOR LEXEME: + #12: TOKEN: IDENT LEXEME: x #13: TOKEN: IDENT LEXEME: camelCase #14: TOKEN: OPERATOR LEXEME: & #15: TOKEN: OPERATOR LEXEME: & #16: TOKEN: ? LEXEME: ^ #17: TOKEN: SCICONST LEXEME: 44.4E3321 #18: TOKEN: KEYWORD LEXEME: WHILE #19: TOKEN: KEYWORD LEXEME: IF #20: TOKEN: IDENT LEXEME: flex #21: TOKEN: IDENT LEXEME: proc #22: TOKEN: DECCONST LEXEME: 203.978 #23: TOKEN: DECCONST LEXEME: -22.4 #24: TOKEN: OPERATOR LEXEME: + #25: TOKEN: STRCONST LEXEME: "30x2" #26: TOKEN: ? LEXEME: ' #27: TOKEN: ? LEXEME: ! #28: TOKEN: HEXCONST LEXEME: ABCH #29: TOKEN: IDENT LEXEME: FFF #30: TOKEN: DECCONST LEXEME: 123.456 #31: TOKEN: INTCONST LEXEME: 1 #32: TOKEN: INTCONST LEXEME: +2 #33: TOKEN: OPERATOR LEXEME: = #34: TOKEN: INTCONST LEXEME: 3 #35: TOKEN: INTCONST LEXEME: +4 #36: TOKEN: OPERATOR LEXEME: > #37: TOKEN: IDENT LEXEME: t #38: TOKEN: BINCONST LEXEME: 00B #39: TOKEN: INTCONST LEXEME: 1010101 #40: TOKEN: IDENT LEXEME: B #41: TOKEN: ? LEXEME: " #42: TOKEN: IDENT LEXEME: a #43: TOKEN: IDENT LEXEME: bc #44: TOKEN: ? LEXEME: " #45: TOKEN: CHCONST LEXEME: 12X #46: TOKEN: IDENT LEXEME: banana #47: TOKEN: INTCONST LEXEME: 5 #48: TOKEN: OPERATOR LEXEME: & Hint You can use #include
and the setw() stream manipulator to get nice column formatting. Hint.l /* A starting flex file */ /* ---- PROLOGUE ---- */ %{ #include using namespace std; int no_lines = 0; %} /* ---- DEFINITIONS ---- */ %option noyywrap DIGIT [0-9] %% /* ---- REGULAR EXPRESSIONS ---- */ [ \t] ; \n { no_lines++; } {DIGIT}+ { cout < "found="" an="" number:="" "="">< yytext="">< endl;="" }="" [a-za-z0-9]+="" {="" cout="">< "found="" a="" string:="" "="">< yytext="">< endl;="" }="" %%="" *="" ----="" epilogue="" ----="" */="" #49:="" token:="" lexeme:="" @="" #50:="" token:="" sciconst="" lexeme:="" 12.53e231="" #51:="" token:="" intconst="" lexeme:="" 2="" #52:="" token:="" ident="" lexeme:="" b="" #53:="" token:="" ident="" lexeme:="" or="" #54:="" token:="" ident="" lexeme:="" not="" #55:="" token:="" ident="" lexeme:="" tobe1="" #56:="" token:="" binconst="" lexeme:="" 111b="" #57:="" token:="" intconst="" lexeme:="" 78="" #58:="" token:="" ident="" lexeme:="" e="" #59:="" token:="" operator="" lexeme:="" #60:="" token:="" intconst="" lexeme:="" -42="" #61:="" token:="" lexeme:="" .="" #62:="" token:="" lexeme:="" .="" #63:="" token:="" strconst="" lexeme:="" "another_str_constant"="" 10="" lines="" processed.="" int="" main()="" {="" cout="">< "hello="" flex!"="">< endl;="" yylex();="" cout="">< "done!"="">< endl;="" return="" 0;="" } ="" hw="" 2/sampleinput.txt="" print="" some="" while="" input="" +="" -1234="" %what="" about="" this?="" */-="" 0123="" -99="" +="" x="" camelcase="" &&^="" 44.4e3321="" %%%="" yet="" another="" comment="" while="" if="" flex="" proc="" 203.978="" -22.4="" +="" "30x2"="" '="" !="" abch="" fff="" 123.456="" %%="" here="" be="" dragons.="" 1+2="3+4">t 00B 1010101 B "a bc" 12X banana 5 &@ 12.53E231 2B or not toBE1 111B 78E / -42.. "another_str_constant" HW 2/sampleoutput.txt #0: TOKEN: KEYWORD LEXEME: PRINT #1: TOKEN: IDENT LEXEME: some #2: TOKEN: IDENT LEXEME: while #3: TOKEN: IDENT LEXEME: input #4: TOKEN: OPERATOR LEXEME: + #5: TOKEN: INTCONST LEXEME: -1234 #6: TOKEN: OPERATOR LEXEME: * #7: TOKEN: OPERATOR LEXEME: / #8: TOKEN: OPERATOR LEXEME: - #9: TOKEN: INTCONST LEXEME: 0123 #10: TOKEN: INTCONST LEXEME: -99 #11: TOKEN: OPERATOR LEXEME: + #12: TOKEN: IDENT LEXEME: x #13: TOKEN: IDENT LEXEME: camelCase #14: TOKEN: OPERATOR LEXEME: & #15: TOKEN: OPERATOR LEXEME: & #16: TOKEN: ? LEXEME: ^ #17: TOKEN: SCICONST LEXEME: 44.4E3321 #18: TOKEN: KEYWORD LEXEME: WHILE #19: TOKEN: KEYWORD LEXEME: IF #20: TOKEN: IDENT LEXEME: flex #21: TOKEN: IDENT LEXEME: proc #22: TOKEN: DECCONST’,>