Toy Language:empty/* Program */ -> programbeginend
The following grammar (BNF/Context-free grammar) describes our toy language. Comments in the grammar are delimited by /* */. Non-terminals are enclosed in angle brackets. Terminals include keywords (in lower case) and operators. Lexer rules are specified in uppercase. Terminals and operators are specified in bold font. The word (italicized) implies an empty rule. Rules inside curly braces are optional. The grammar is free of left-recursion and suitable to be used in a recursive descent parser.
-> IDENTIFIER
->
-> | | empty
/* Global String Declaration */
-> string := ;
-> STRINGLITERAL
/* Variable Declaration */
-> ;
-> float | int
-> | void
->
-> , | empty
/* Function Paramater List */
-> | empty
->
-> , | empty
/* Function Declarations */
-> | empty
-> function ( ) begin
end
->
/* Statement List */
-> | empty
-> | |
-> | | |
/* Basic Statements */
-> ;
-> :=
-> read ( );
-> write ( );
-> return ;
/* Expressions */
->
-> | empty
->
-> | empty
-> |
-> ( {} )
->
-> , | empty
-> () | | INTLITERAL | FLOATLITERAL
-> + | -
-> * | /
/* Complex Statements and Condition */
-> if ( ) endif
-> elseif ( ) |
empty
-> | true | false
-> <>> | == | != | <=>>=
-> do while ( );
An IDENTIFIER token begins with a letter and is followed by any number of letters and numbers. IDENTIFIERS are case sensitive.
INTLITERAL: integer number
ex) 0, 123, 678
FLOATLITERAL: floating point number available in two different format
yyyy.xxxxxx or .xxxxxxx
ex) 3.141592 , .1414 , .0001 , 456.98
STRINGLITERAL: any sequence of characters except '"'
between '"' and '"'
ex) "Hello world!" , "****==****" , "this is an example string"
COMMENT: Starts with "##" and lasts till the end of line
ex) ## This is a comment
...
parse_tree.out
This file should have the parse tree printed. The parse tree should be printed according to its horizontal levels. These levels are according to the depth of each node in the final parse tree and the order in which the nodes are parsed while building the parse tree should not affect how they are printed. The first line should have the root node, the next line should have root's child(ren), the next line should have the child(ren) of root's child(ren) and so on. Each node is given a unique id, starting from 1. For ids you don't have to follow any specific pattern/order; they should just be unique. For each node, its id, rule name if it is a non-terminal or text of the terminal, and parent's id are printed in this format. (For root node, parent's id is 0). For empty rule, print a special symbol '@'. There should be 8 spaces between two nodes' descriptions.
Format:
(id) name [parent's id]
program test
begin
string var := "global"
end
Example test program:
Example left-most derivation (just for your understanding, your program will not print it):
=> program begin end
program test begin end
program test begin end
program test begin end
program test begin end
program test begin string := ; end
program test begin string var := ; end
program test begin string var := "global" ; end
program test begin string var := "global" ; end
program test begin string var := "global" ; end
Example parse tree (just for your understanding, your program will not print it like this)
Example output in parse_tree.out:
(1) [0]
(2) program [1] (3) [1] (4) begin [1] (5) [1] (6) end [1]
(7) test [3] (8) [5] (9) [5]
(10) [8] (11) [8] (12) @ [9]
(13) string [10] (14) [10] (15) := [10] (16) [10] (17) ; [10] (18) @ [11]
(19) var [14] (20) "global" [16]
In this sample output, I have assigned unique ids to nodes in horizontal level order, but in your program you can assign them differently (actually it would be easier to assign them while parsing nodes).
symbol_table.out
This file should have the symbol table printed. Symbol table is initialized with identifiers by the lexer. The parser adds the type information corresponding to each identifier. Symbol tables store name and type of variables. For string variables they additionally store the value of the string. For functions they also store their parameters. Below is the output format for each symbol table:
Symbol table
name kind [variable/function] type [params]
name kind [variable] type value
...
Where is either global, local to a function, or local to a block. For global, use "GLOBAL", for function, use function name, for blocks, use "Block X" where X is a counter that increments when the parser encounters a new block.
The entries should be printed in the same order they are found in the program.
Sample Usage:
Suppose after compiling the main class is program.class. It will be run as:
Ø java program input.toy
where input.toy is the file that contains the source code of the test program (as the Fibonacci program given above). Your program should read name (or complete path) of the file provided on the command line. If no command line argument is provided, your program should print a message that it requires a file name in the command line and exit.
Part-1:
The program should first print your name and Red ID and then either print the error messages on the console if there are any lexical errors in the program or it should print success and create a file in the same directory named: lexer.out
Part-2:
The program should first print your name and Red ID and then either print the error messages on the console if there are any lexical errors or syntax errors in the program or it should print success and create three files in the same directory named: lexer.out, parse_tree.out, symbol_table.out
Part-2:
To submit part-2 you will follow the same instructions, the only difference would be that this time you will create Program1b directory in assignments directory and copy all the files there.
Grading:
Assignment 1a: 30 points
Assignment 1b: 70 points
Assignment 1a Breakdown:
Handling of valid programs and printing correct output in lexer.out: 24 points
Handling of invalid programs and printing error messages: 6 points
Assignment 1b Breakdown:
Correctly printing lexer.out for valid programs and errors for invalid programs: 5 points
Correctly printing parsing information in parse_tree.out for valid programs: 30 points
Printing error messages (parsing errors) for programs that are syntactically invalid: 10 points
Correctly printing symbol table information in symbol_table.out: 20 points
Printing declaration errors for programs that have them 5 point
=>