CS445 Introduction to Compilers
Assignment 4
(Error Handling and Catch-up Assignment)
180 points
DUE: Wed Nov 11 at 5PM PST

If you have fallen behind this is your big chance to catch up! In order to pass the course you need to keep up with the work. With this assignment, I want to get everyone on track. The main credit for this course at the end of the semester is the compiler itself. I want to be able to congratulate everyone on having built their own compiler! Note that this assignment needs to be done quickly so we can work on the final code gen phase.

There are several parts to this assignment.

  • nice yyerror and error count
  • insert the error token into your grammar
  • fake out the semantic section with I/O routines below

    Nicer Errors

    yyerror catches all the message that come back from the parser. Here are some examples of syntax error messages that come into yyerror from the parser if YYERROR_VERBOSE is set:

    Invalid input character
    syntax error, unexpected '+'
    syntax error, unexpected ELSE
    syntax error, unexpected '=', expecting '('
    syntax error, unexpected '=', expecting WHILE
    syntax error, unexpected ID, expecting '(' 
    syntax error, unexpected ID, expecting WHILE
    syntax error, unexpected '+', expecting ',' or ';'
    syntax error, unexpected '/', expecting BOOLEAN or INT or VOID 
    syntax error, unexpected ID, expecting $end or BOOLEAN or INT or VOID 
    
    We will put in a linenumber and format the message like in the semantic analysis section. Below is the code that will translate these messages but NOT keep track of the warning/error count. You need to add that.
    // // // // // // // // // // // // 
    // Write a nice error message
    // 
    #define YYERROR_VERBOSE
    
    char *printableChar="@ABCDEFGHKLMNOPQRSTUVWXYZ[\\]^_";
    char printableBuffer[5];
    char *printable(unsigned int c)
    {
        if (c<32) {
            printableBuffer[0]='^';
            printableBuffer[1]=printableChar[c];
            printableBuffer[2]=EOS;
        }
        else if (c<127) {
            printableBuffer[0]=c;
            printableBuffer[1]=EOS;
        }
        else if (c==127) {
            strcpy(printableBuffer, "DEL");
        }
        else {
            sprintf(printableBuffer, "0x%2x", c&0xff);
        }
        return printableBuffer;
    }
    
    
    void yyerror(const char *msg)
    {
        if (strstr(msg, "Invalid input character")) {
            printf("WARNING(%d): %s: %s.  Character ignored.\n", 
                   line, 
                   msg, 
                   printable((unsigned int)*lastToken));
            numWarnings++;
        }
        else {
            char *unex;
            unex = strstr(msg, "unexpected ")+strlen("unexpected ");
            if ((*unex != '\'') && 
                ((strstr(unex, "ID")==unex) || 
                 (strstr(unex, "NUM")==unex))
                ) {
                printf("ERROR(%d): S%s but got the string \"%s\"\n", line, msg+1, lastToken);
            }
            else {
                printf("ERROR(%d): S%s\n", line, msg+1);
            }
            numErrors++;
        }
    }
    
    

    Error Count

    There will be two global variables counting the number of errors and number of warnings. At the end of assignment 3 we only had errors. Now we will also have warnings for practice. Warnings will not stop the successful compiliation of the program. Keep count of both kinds and report at the end of the compile as in this example:
    Number of warnings: 2
    Number of errors: 1
    
    I will explain shortly where you get errors and warnings and what happens as a result.

    Compiler Phases

    Our compiler now was three phases: lexical analysis, syntax analysis, and semantic analysis. Each phase can produce its own errors. Here is the desciption of what you need to do:

    Inserting Error Tokens

    We want to add error tokens so syntactic analysis continues past errors. We do this to help the user get as much useful information about their program as we can in one compile.

    Add the following error productions to your grammar:

    | type_specifier error ';'                     {yyerrok; }                              
    | type_specifier error                         { }                                      
    
    : var_decl_list ',' var_decl_id                { yyerrok;}                              
    | var_decl_list error var_decl_id              {yyerrok; }                              
    | var_decl_list error                          { }                                      
    | error                                        { }                                      
    
    | ID '[' error                                 { }                                      
    | error ']'                                    {yyerrok; }                              
    | error                                        { }                                      
    
    | type_specifier ID '(' error                  { }                                      
    | error ')' statement                          {yyerrok; }                              
    | error                                        { }                                      
    
    : param_list ';' param_type_list               { yyerrok;}                              
    | param_list error param_type_list             {yyerrok; }                              
    | param_list error                             { }                                      
    | error                                        { }                                      
    
    : param_id_list ',' param_id                   { yyerrok;}                              
    | param_id_list error param_id                 {yyerrok; }                              
    | param_id_list error                          { }                                      
    | error                                        { }                                      
    
    | ID '[' error                                 { }                                      
    | error ']'                                    {yyerrok; }                              
    | error                                        { }                                      
    
    : '{' local_declarations statement_list '}'    {yyerrok;
    | '{' error                                    { }
    |   error '}'                                  {yyerrok; }
    
    | statement_list error                         { }                                      
    
    | matched error                                {$$ = $1;}                               
    | unmatched error                              {$$ = $1;}                               
    
    | IF '(' error                                 { }                                      
    | WHILE '(' error                              { }                                      
    | error ')' matched ELSE matched               {yyerrok; }                              
    | error                                        { }                                      
    
    | IF '(' expression ')' error ELSE unmatched   {yyerrok; }                              
    | IF '(' error                                 { }                                      
    | WHILE '(' error                              { }                                      
    | error ')' statement                          {yyerrok; }                              
    | error                                        { }                                      
    
    : expression ';'                               {yyerrok;}                               
    | ';'                                          {yyerrok;}                               
    | error ';'                                    {yyerrok; }                              
    
    : RETURN ';'                                   {yyerrok;}                               
    | RETURN expression ';'                        {yyerrok;                                
    | RETURN error                                 { }                                      
    
    : BREAK ';'                                    {yyerrok;}                               
    
    | error INC                                    {yyerrok; }                              
    | error DEC                                    {yyerrok; }                              
    | var error expression                         {yyerrok; }                              
    | var error                                    { }                                      
    
    | ID '[' error                                 { }                                      
    | error ']'                                    {yyerrok; }                              
    
    | error                                        { }                                      
    | error OR or_expression                       {yyerrok; }                              
    | simple_expression OR error                   { }                                      
    
    | error                                        { }                                      
    | error AND unary_rel_expression               {yyerrok; }                              
    | or_expression AND error                      { }                                      
    
    | NOT error                                    { }                                      
    | error                                        { }                                      
    
    | error relop additive_expression              {yyerrok; }                              
    | additive_expression relop error              { }                                      
    
    | error sumop term                             {yyerrok; }                              
    | additive_expression sumop error              { }                                      
    | error                                        { }                                      
    
    | error mulop unary_expression                 {yyerrok; }                              
    | term mulop error                             { }                                      
    | error                                        { }                                      
    
    | unaryop error                                { }                                      
    
    : '(' expression ')'                           {yyerrok;}                               
    | '(' error                                    { }                                      
    | error ')'                                    {yyerrok; }                              
    | '*' error                                    { }                                      
    | error                                        { }                                      
    
    : ID '(' args ')'                              {yyerrok;}                               
    | error '(' args ')'                           {yyerrok; }                              
    | ID '(' error                                 { }                                      
    
    : arg_list ',' expression                      {yyerrok; }                              
    | error ',' expression                         {yyerrok; }                              
    
    
    The only action provided is the yyerrok. You can add whatever other actions you like in the braces.

    The result of adding these tokens will be that may have many shift/reduce conflicts, and reduce/reduce conflicts.

    You may have to adjust your grammar a little to get these to fit. On the final grading assignment you will be graded on the actual error messages you generate, not where you put your error tokens.

    When you run your compiler on various error senarios you will see the moment that the production that has an error because it will print one of the special messages above that begins with **ERROR so you can see what's going on. This code is to be left in for the submission of your assignment and will be compared against. For debugging you may want the output to go to stdout sometimes and stderr other times. when you turn in your assignment make the output go to stdout.

    Example Input and Output

     
      1 int dog () 
      2 {
      3   int max, k;
      4   bool n[20^k];
      5
      6   if =;
      7   max = 2x0;
      8   +*+;
      9   else;
     10   2=2; 
     11   !if-+-+;
     12   999 999 if;
     13   dog(while@#);
     14   (3+4;
     15   3+4);
     16 }
     17
     18 int elk( { {
    
    WARNING(4): Invalid input character: ^.  Character ignored.
    ERROR(4): Syntax error, unexpected ID, expecting ']' but got the string "k"
    ERROR(6): Syntax error, unexpected '=', expecting '('
    ERROR(7): Syntax error, unexpected ID, expecting ';' but got the string "x0"
    ERROR(8): Syntax error, unexpected '+'
    ERROR(8): Syntax error, unexpected ';'
    ERROR(9): Syntax error, unexpected ELSE
    ERROR(10): Syntax error, unexpected '=', expecting ';'
    WARNING(11): Invalid input character: !.  Character ignored.
    ERROR(11): Syntax error, unexpected '-', expecting '('
    ERROR(11): Syntax error, unexpected '+'
    ERROR(12): Syntax error, unexpected NUM, expecting ';' but got the string "999"
    ERROR(13): Syntax error, unexpected WHILE
    WARNING(13): Invalid input character: @.  Character ignored.
    WARNING(13): Invalid input character: #.  Character ignored.
    ERROR(14): Syntax error, unexpected ';', expecting ')'
    ERROR(15): Syntax error, unexpected ')', expecting ';'
    ERROR(18): Syntax error, unexpected '{', expecting BOOLEAN or INT or VOID or ')'
    Number of warnings: 4
    Number of errors: 14
    

    The Catch-up Part

    You will be retested on the final tests from assignment 3 as well as some files with horrible syntax errors. This means if you ran perfectly last time you need do nothing to get credit for that. If you fixed some stuff from last time then you get credit for that.

    Produce Better Results than Me

    The above error tokens are by no means perfect. I will award extra points to improving my error token insertions. Possible improvements:
    1. Better catching of all errors without being redundant.
    2. Catching all the same errors but producing substantial reductions in shift/reduce and reduce/reduce errors.
    I am the final judge of what is a good enough improvement to get extra points.

    Example Runs

    Coming soon Here are some example test runs.

    Submission

    Homework will be submitted as an uncompressed tar file to the homework submission page. You can submit as many times as you like. The LAST file you submit BEFORE the deadline will be the one graded. For all submissions you will receive email showing how your file performed on the pre-grade tests. The grading program will use more extensive tests and those results will be mailed to when they are run.

    If you have tests you really think are important or just cool please send them to me and I will consider adding them to the test suite.


    Robert Heckendorn Up One Level Last updated: Apr 8, 2007 22:40