|
Assignment 2 (Construct the Syntax Tree) |
250 points |
| DUE: Sat Oct 10 at 5pm PDT (see warning below) |
This homework consists of these tasks:
void printTree(FILE *, TreeNode *);Your main program will contain:
yyparse(); printTree(stdout, syntaxTree);The first line calls the parser which will store the tree in the global called syntaxTree which is defined:
TreeNode *syntaxTree;
| Do not be fooled. This is a . Do not put off this assignment. It is complicated and detailed. |
These tasks are descibed further below.
When done with this assignment you will have created code that will recognize legal C- programs and generate the first pass at the tree.
The parser will be named c- just like last time. It will read and process a stream of tokens from a filename given as the first argument to the c- command OR from standard input if the filename argument is not present.
It will now also take the -d option as a first argument. I recommend using the getopt routine since this will handle UNIX arguments in a uniform and standard way. The -d option turns on the yydebug flag by setting it to 1.
For example: c- -d sort.c- should run the c- compiler on the program sort.c- and give details of the parsing that is going on. While c- sort.c- should simply run the c- compiler.
For this assignment your compiler should record the line number and string representation of the last token scanned in global variables. These global variables are for adding arguments to yyerror. Rewrite your yyerror routine to print a message as in this error message:
printf("ERROR lineno(%d):%s. I got: %s\n", lineno, msg, lastScannedToken);
The msg is passed into the yyerror routine as we will
discuss in class.
You can write out the error message using any method you like but the
content of the error message must be exactly like the above.
To get this all to work nicely,
turn on verbose error messaging with this macro definition:
#define YYERROR_VERBOSEWe will continue to improve on the invocation line and error reporting as our compiler gets more sophisticated. HINT: As we will discuss it class it is important that the you allocate a new string for each token as it is scanned to avoid the problem of referring to a reusable buffer.
For the parsing part of the assignment modify your Bison grammar to parse C- code. A good approach is to initially forget about the syntax tree part of the assignment. If you get the right grammar into your compiler it will successfully parse any C- program. A program that simply recognizes whether a program is legal or not is called a recognizer. When you build your bison grammar directly from the one supplied you will find that you have the dangling else problem. There are several ways to fix this problem. I will discuss one in class.
Coding restriction: Do not attempt to fix dangling else with associtivity declarations such as %left. Do not fix any other problem with your grammar by using the %expect feature of Bison. This causes Bison to ignore some number of parsing errors and me to deduct points from your assignment. Really, you can do this with out this "feature". I expect your parser to compile without any parser errors.
Now that your recognizer is working. Let's look at the syntax tree I want you to produce. As we will discuss in class the tree is an abbreviated portion of the parse tree containing the parts we are interested in. Here is a sample TreeNode that I used:
typedef struct treeNode
{
struct treeNode *child[MAXCHILDREN]; // children of the node
struct treeNode *sibling; // siblings for the node
int lineno; // line number for errors
NodeKind nodekind; // type of node
union // subtype of type
{
DeclKind decl; // used when DeclK
StmtKind stmt; // used when StmtK
ExpKind exp; // used when ExpK
} kind;
union // relevant data in type -> attr
{
OpKind op; // type of token (same as in bison)
int val; // used when ConstantK
char *name; // used when IdK
} attr;
ExpType expType; // used when ExpK for type checking
int size; // used for size of array
bool isArray; // is this an array
} TreeNode;
This design is stolen straight from the book. This way you can
use the one in the book as an example to work from. Ours has to
have extra features and node types. We will discuss this in detail in
class.
To encode the program as a tree you need to make the right nodes at the right steps in the parsing. When you need to make a node you will use routines you write similar to the newStmtNode function in util.c for the Tiny language in the book. These will be passed up the tree and assembled as in the tiny example in the book. Coding restriction: Do not use YYSTYPE as used in the book. This subverts features that are there to help you. I will discuss how to use this to your advantage. I will also talk about how to use this:
%union {
ExpType type;
int number;
TokenData tokenData;
TreeNode *tree;
}
int max(int x, y) {
int z;
if (x>y) z=x;
else z=y;
return z;
}
you should get the following output from your c-. The // comments are
not part of the output but to explain what you are seeing.
Function max returns type int [line: 1] // this is the declaration node for a function
Child: 1 // it has 2 children. Child 1 are parameters.
Param x of type int [line: 1] // the first parameter is x of type int
Sibling: 1 // the parameters are tied together as a linked list of siblings
Param y of type int [line: 1] // the second parameter is y of type int
Child: 2 // the second child of the function declaration is the statements
Compound [line: 1] // the body of a function is treated as a compound statement
Child: 1 // the first child of a compound statement is a list of declarations
Var z of type int [line: 2] // z is declared of type int
Child: 2 // the second child of a compound statement is a list of statements
If [line: 3] // the if node has two or three children
Child: 1 // the first child is the test
Op: > [line: 3] // a relational operator > applied
Child: 1 // to the two children
Id: x [line: 3] // the first of which is x
Child: 2
Id: y [line: 3] // the second is y
Child: 2 // the second child of the if is the then clause
Assign: = [line: 3] // z = x
Child: 1
Id: z [line: 3]
Child: 2
Id: x [line: 3]
Child: 3 // the third child of the if is the else clause
Assign: = [line: 4] // z = y
Child: 1
Id: z [line: 4]
Child: 2
Id: y [line: 4]
Sibling: 1 // the second statement in the body of the compound statement
Return [line: 5] // return which takes as its only child
Child: 1
Id: z [line: 5] // the variable z
In the cases where there is an optional expression or statement the
corresponding child pointer is set to NULL (i.e. 0).
For example compound statements might not have any declarations so child[0]
would be set to NULL. Return optionally takes an expression. If there isn't an
expression then the Child[0] is NULL. The while statement might not
have a body: for example while (searching()); in which case child[1]
is NULL. The default for unneeded children and siblings is always NULL.
The question is which node's linenumber do you use to issue the error? For example, if there is a problem with a big long hairy while statement we will what tag the error where the while token is. One could have used the line number from the test but that could become tricky if the test goes over multiple lines. A clear decision on the major tokens is given below.
Here are node types and where the line is said to be:
//declarations
VarK at the ID
FuncK at the ID
ParamK at the ID
//statements
IfK at the IF
WhileK at the WHILE
CompoundK at the {
ReturnK at the RETURN
BreakK at the BREAK
//operators
OpK at the operator
ConstantK at the constant
IdK at the ID
AssignK at the =
CallK cat the ID
So, in a declaration of a variable
the declaration node of type VarK is said to
be on the line that the ID token was found.
"If" statements are where the IF token was found etc.
HINT: The yacc code in the book is a good example of how to connect the nodes you create. The node create code is a good model for how to create nodes and print a tree. Use your notes from class on how to put the rest of it together.
If you have tests you really think are important or just cool please send them to me and I will consider adding them to the test suite.
| Robert Heckendorn | Last updated: |