|
Assignment 1 (The Scanner) |
120 points |
| DUE: Wed Sep 9 at 5PM PST |
Because the output of your program will first be preprocessed by an automatic comparison program before being handed to a human being. Please follow formatting instructions/examples carefully. The results your program produces will need to look exactly like the target. Please do not embellish with extra titles or other text such as "run complete" or "CS445 output". The testing facility of the submit script will help you get this annoying detail right. Thanks for your patience.
Use both Flex and Bison to build a scanner for the C- language. The scanner will be named c- (note the lowercase. That is c- will be the compiler for the language C- in uppercase.). It will read and process a stream of tokens from a filename given as the first argument to the c- command or from standard input if the filename argument is not present. This means the call to c- on the C- code in file filenameis defined as:
c- {filename}
So
c- filenameworks and so would
cat filename | c-work and so would
c- < filenamework.
It will produce a stream of tokens as its output as described below. (Pretesting your answer will help assure compliance.) It will be constructed using flex and bison to run on the machine wormulon or one of its clones where the grading will occur.
Note that in C- like C and C++ newline is not an element of the grammar and is merely whitespace. This was not true for the calculator program we did (will do) in class.
An example of the output for these C- statements:
if (v==0) return u; pi = 3.14159; x=true and y; fred(x++, y[3]);is:
Line 1 Token: IF Line 1 Token: ( Line 1 Token: ID Value: v Line 1 Token: EQ Line 1 Token: NUM Value: 0 Line 1 Token: ) Line 2 Token: RETURN Line 2 Token: ID Value: u Line 2 Token: ; Line 3 Token: ID Value: pi Line 3 Token: = Line 3 Token: NUM Value: 3 ERROR(3):Invalid token '.' Line 3 Token: NUM Value: 14159 Line 3 Token: ; Line 4 Token: ID Value: x Line 4 Token: = Line 4 Token: LOGIC Value: T Line 4 Token: AND Line 4 Token: ID Value: y Line 4 Token: ; Line 5 Token: ID Value: fred Line 5 Token: ( Line 5 Token: ID Value: x Line 5 Token: INC Line 5 Token: , Line 5 Token: ID Value: y Line 5 Token: [ Line 5 Token: NUM Value: 3 Line 5 Token: ] Line 5 Token: ) Line 5 Token: ;Numbers are printed as numbers (using %d or %i format), IDs as strings, boolean true as T and boolean false as F. Again, the pretest will let you know where your format is off.
(Hint: also note that IDs are not what you think they are. They are not exactly like in C++.)
The type of any single character token is printed as the character itself. The type of any multicharacter token is printed as follows:
!= NEQ += PASSIGN ++ INC -= MASSIGN -- DEC <= LEQ == EQ >= GEQ and AND bool BOOLEAN break BREAK else ELSE false LOGIC if IF int INT not NOT or OR return RETURN true LOGIC void VOID while WHILE an id ID a number NUMNote that you can use whatever internal symbols you want but the output must print token types as above for comparison.
If you have tests you really think are important or just cool please send them to me and I will consider adding them to the test suite.
| Robert Heckendorn | Last updated: |