CS445 IAQ (Infrequently Asked Questions)

Q: Is there a maximum length to variable names in C and C++?

A: Depends on the standard with which the compiler complies.

Jack Applin (expert in C standards) replies:
There are several standards:

C89    The implementation shall treat at least the first 31 characters
       of an internal name (a macro name or an identifier that does not
       have external linkage) as significant.  Corresponding lower-case
       and upper-case letters are different.  The implementation may
       further restrict the significance of an external name (an
       identifier that has external linkage) to six characters and may
       ignore distinctions of alphabetical case for such names.
       These limitations on identifiers are all implementation-defined.

C99    It is very complex.

C++98  An identifier is an arbitrarily long sequence of letters and
       digits. Each universal-character-name in an identifier shall
       designate a character whose encoding in ISO 10646 falls into one
       of the ranges specified in Annex E. Upper- and lower-case
       letters are different. All characters are significant.

C++ says "arbitrarily long", but there may be further restrictions
on program line length, which will have the effect of limiting
the length of identifiers.
A test with g++ (i686-apple-darwin9-g++-4.0.1) has distinct 33 character names.
#include 

int main()
{
    int abcdefghijabcdefghijabcdefghijabx;
    int abcdefghijabcdefghijabcdefghijaby;

    abcdefghijabcdefghijabcdefghijabx = 1;
    abcdefghijabcdefghijabcdefghijaby = 2;
    printf("%d %d\n", 
           abcdefghijabcdefghijabcdefghijabx, 
           abcdefghijabcdefghijabcdefghijaby);

    return 0;
}

Q: How long is an integer constant? When is overflow decided?

A: "overflow" is decided at scan time.

Consider this test program in C++:
int main()
{
    1099511627776;                // 2^40 in default long type
    1099511627776LL;              // 2^40 in long long int type
    1208925819614629174706176;    // 2^80 in default long type
    1208925819614629174706176LL;  // 2^80 in long long int type

    return 0;
}
Using g++ (i686-apple-darwin9-g++-4.0.1) with no options the constants are considered to be long unless otherwise stated and long defaults to 32 bits.
z.cpp:3: error: integer constant is too large for 'long' type
z.cpp:5:5: warning: integer constant is too large for its type
z.cpp:6:5: warning: integer constant is too large for its type
Only line 4 is the allowed size because it is explicitly specified as a long long. If I add the compile option -m64 that gives us default longs being 64 bits and I get:
z.cpp:5:5: warning: integer constant is too large for its type
z.cpp:6:5: warning: integer constant is too large for its type
Note: Some things are errors and some are warnings. Here is my guess. When the constant is scanned it is read into a 64 bit place. This works on line 3. Then it fails to fit at some later phase in the compile when it is made the default size for long. On lines 5 and 6 the constant doesn't even read in and so the scanner, oddly enough, issues a warning.

Q: What happens in a character class with a trailing or leading minus?

A: Regular expressions for BSD based OSX comply with IEEE Std 1003.2 (``POSIX.2'') regular expressions.

To quote from the man page for RE_FORMAT(7):
...  To include a literal `]' in the list, make it the first character
(following a possible `^').  To include a literal `-', make it the
first or last character, or the second endpoint of a range.  To use a
literal `-' as the first endpoint of a range, enclose it in `[.' and
`.]' to make it a collating element (see below).  With the exception
of these and some combinations using `[' (see next paragraphs), all
other special characters, including `\', lose their special
significance within a bracket expression.  ...
tests using grep and flex on my machine confirm compliance for the case of leading and trailing minuses. Noncompliant machines may do something else, of course.