Assignment3 is due at9pm Tuesday, February19. It is worth 40 points.
Procedures
This assignment is to be done individually.
Turn in answers to the exercises below on theUA Blackboard Learnsite, under Assignment3 for this class.
- Your answers should consist of the answer to ExerciseA (submitted as an attached text/PDF file, Text Submission, or Comments), and the source code for Exercise B (file
lexit.lua
).
- Your homework submission may not be examined immediately. If you have questions,e-mail me.
Exercises (40 pts total)
Exercise A — Running a Forth Program
Purpose
In this exercise you will make sure you can execute Forth code.
Instructions
Get the filecheck_forth.fs
from the class Git repository. This is a Forth source file for a complete program. When it is executed, this program prints “Secret message #3:
”. Below that, it prints a secret message. Run the program. What is the secret message?
Note. As always, there are several ways to run this program. One of them is to start up the GNU version of Forth (Gforth), and type the following.
[Interactive Forth]
include check_forth.fs
Exercise B — Lexer in Lua
Purpose
In this exercise you will write a Lua module that does lexical analysis.
In the next assignment, you will build a parser on top of your lexer. And in a later assignment, you will build an interpreter that uses the output from your parser.
Instructions
Write a Lua modulelexit
, contained in the filelexit.lua
. Your module should do lexical analysis using the state-machine approach.
Be sure to follow theCoding Standards.
- The interface of module
lexit
is very similar to that of modulelexer
, which was written in class—with some differences, to be covered shortly. In particular,lexit
exports:
- Function
lex
(i.e.,lexit.lex
), which takes a string parameter and allows for-in iteration through lexemes in the passed string.
- Numerical constants representing lexeme categories. These constants are shown in a table below.
- Table
catnames
, which maps lexeme category numbers to printable strings.
- The interface of module
lexit
differs from that oflexer
as follows:
- Lexing is done based on theLexeme Specificationin this document (below), not the one distributed in class.
- The exported numerical constants and category names are different.
- Module
lexit
should export nothing other than tablecatnames
, seven constants representing lexeme categories, and functionlex
. You may write anything you want in the source code for the module, as long as it islocal
and not exported.
The following properties of modulelexer
should hold for modulelexit
as well.
- At each iteration, the iterator function returns a pair: a string, which is the string form of the lexeme, and a number representing the category of the lexeme.
- The number mentioned in the previous point is suitable as a key for table
catnames
.
- The iteration ends when there are no further lexemes.
The correspondence between lexeme category numbers and category names/strings should be as follows.
Category Number |
Named Constant |
Printable Form |
---|
1 |
lexit.KEY
|
Keyword
|
2 |
lexit.ID
|
Identifier
|
3 |
lexit.NUMLIT
|
NumericLiteral
|
4 |
lexit.STRLIT
|
StringLiteral
|
5 |
lexit.OP
|
Operator
|
6 |
lexit.PUNCT
|
Punctuation
|
7 |
lexit.MAL
|
Malformed
|
Thus, the following code should work.
[Lua]
lexit = require "lexit" program = "x = 3 # Set a variable\n write(x+4, cr)\n" for lexstr, cat in lexit.lex(program) do print(lexstr, lexit.catnames[cat]) end
Lexeme Specification
You will write a lexer that is to be part of an interpreter for a programming language calledJerboa.
Whitespacecharacters are blank, tab, vertical-tab, new-line, carriage-return, form-feed. Except forStringLiterallexemes, no lexeme may contain a whitespace character. So a whitespace character, or any contiguous group of whitespace characters, is a separator between lexemes. However, pairs of lexemes are not required to be separated by whitespace.
Acommentbegins with pound sign (#
) occurring outside aStringLiterallexeme, and ends at a newline character or the end of the input, whichever comes first. There are no other kinds of comments. Any character at all may occur in a comment.
Comments are treated by the lexer as whitespace: they are not part of lexemes and are not passed on to the caller.
Legalcharacters outside comments andStringLiterallexemes are whitespace and printable ASCII (values 32 [blank] to 126 [tilde]). Any other characters outside comments andStringLiterallexemes areillegal.
Once a lexeme has begun, the maximal-munch rule is followed,except in the following special case. If a lexeme begins with “+
” or “-
”, and it is not the first lexeme in the input, and the previous lexeme was anIdentifier,NumericLiteral, right parenthesis (“)
”), right bracket (“]
”),true
, orfalse
, then this lexeme is a one-characterOperator.
There are seven lexeme categories:Keyword,Identifier,NumericLiteral,StringLiteral,Operator,Punctuation,Malformed.
Keyword
- One of the following twelve:
cr
def
else
elseif
end
false
if
readnum
return
true
while
write
Note. The reserved words are the same as theKeywordlexemes.
Identifier
- A letter or underscore (
_
), followed by zero or more characters that are all letters, digits, or underscores, and not a keyword. (Examples: “myvar
”, “_
”, “_x_37_
”.)
NumericLiteral
- A sequence of one or more digits, possibly preceded by an optional plus-sign (
+
) or minus-sign (-
), and possibly followed by an optionalexponent.Anexponentis the letter “e
” or “E
” followed by an optional “+
”, and then one or more digits.
Notes. ANumericLiteralcannot contain a dot (
.
). A minus sign is not legal in an exponent. A plus sign is legal, and optional, in an exponent. An exponent must contain at least one digit.
For example, here are some validNumericLiterallexemes.
1234 -23 00900 +123e+7 -00E00 3e888
The following arenotvalidNumericLiterallexemes.
+3e e 123E+ +-3 1.23 123e-7
The first string above is aNumericLiteral(
+3
) followed by anIdentifier(
e
). The second is anIdentifier(
e
). The third is aNumericLiteral(
123
), anIdentifier(
E
), and anOperator(
+
). The fourth is anOperator(
+
) and aNumericLiteral(
-3
). The fifth is aNumericLiteral(
1
), aPunctuation(
.
), and aNumericLiteral(
23
). The last is aNumericLiteral(
123
), anIdentifier(
e
), and aNumericLiteral(
-7
).
StringLiteral
- A single quote (
'
) or double quote ("
), followed by zero or more characters that are not the same as the initial quote mark and not newline, followed by a duplicate of the initial quote mark. AStringLiteralends with the same character it began with. The beginning and ending quote marks are both part of the lexeme. Any character at all may occur between the quotes of aStringLiteral, except for newline and the quote that began the literal. There are no backslash escapes. (Examples: “'Hello'
”, “"bye-bye"
”, “'abc"abc'
”.)
Operator
- One of the following seventeen:
&&
||
!
==
!=
>
>=
+
-
*
/
%
[
]
=
Punctuation
- Any single legal character that is not whitespace, not part of a comment, and not part of any valid lexeme in one of the other categories, includingMalformed. (Examples: “
&
”, “(
”.)
Malformed
- There are two kinds ofMalformedlexemes:bad characterandbad string.
Abad characteris any single character that is illegal, that is not part of a comment or aStringLiterallexeme that began earlier.
Abad stringis essentially a partialStringLiteralin which either a newline character or the end of the input is reached before the ending quote mark. It begins with a single or double quote mark that is not part of a comment or aStringLiterallexeme that began earlier, and continues to either a newline character (which is included in the lexeme) or the end of the input, without the matching ending quote mark appearing. (Examples: “'a-b-c
”, “"xyz
”, both either with a newline appended or ending at the end of the input.)
Note. The two kinds ofMalformedlexemes are presented to the caller in the same way: they are both simplyMalformed.
Test Program
A test programis available in the Git repository:lexit_test.lua
. If you compile and run this program (unmodified!) with your code, then it will test whether your code works properly.
Do not turn in the test program.
Notes
- You will use your code from Assignment3 again in Assignment4. It will also need to work with your code in Assignment6. Do an extra good job!
- SeeThoughts on Assignment3, on the lecture slides for February11.