Operator Precedence of Unary Operators

Operator precedence of unary operators

My programming ruby book (2nd edition) also lists unary operators as having higher precedence than assignment.

The unary operator IS being given highest precedence. The reason the line is parsed as ~ (a = 1) is because decomposing the line into valid syntax is of higher precedence than anything else, including using the simple variable 'a' as the expression the unary operator operates on.

If the ruby parser could have made something valid of the rest of the line, it would have used (~ a), but there is no valid rule than matches = something, only lvalue '=' rvalue.

You can regard "valid syntax" as the top priority, then simple values, constant and variable names and then the standard operators under that.

Do unary operators have higher precedence than the cast expression?

Let's focus on the operators you are analyzing in your test. Logical not ! operator and casting operator (type) do have the same precedence.

As you can see in this table, they both have priority #2.

What you need to understand what's going on is also taking into account associativity. Associativity represents the order in which operators having the same priority will be evaluated.

The operators we are taking about have Right-to-left associativity.

I'll copy your test for clarity:

int main(void)
{
    /*we take advantage of the truncation when casting int to char(only remains the lower 8 bit of the int), if the cast executes first, the value is 0 so the final output is 1. Conversely, we get 0 as the final result.*/
    int test = 512;
    printf("test1-1 is %d\n",!(char)test); /*test1-1 is 1*/
    printf("test1-2 is %d\n",(char)!test); /*test1-2 is 0*/

    getchar();
    return 0;
}

In !(char)test, right-to-left associativity means that the casting is performed first. This leads the value to be "adapted" to the char size, resulting in value 0.
Then you apply the logical negation, resulting in the value 1
In (char)!test, right-to-left associativity means that the logical negation is performed first. This leads to value !512, resulting in value 0.
Then you apply the casting, resulting again in the value 0

For these reasons, you actually get the expected results.

Does it make sense for unary operators to be associative?

It's just an artefact of the way that the associativity is derived from the grammar.

The reason that addition is left-associative is that one of the productions for additive-expression is additive-expression + multiplicative-expression, with the additive-expression on the left. So when you see:

a + b + c

this must be equivalent to (a + b) + c, because the only way to match the production is with a + b as the additive-expression and c as the multiplicative-expression. a on its own is an additive-expression, but b + c is not a multiplicative-expression and so a + b + c doesn't match the production if we try to take a as the additive-expression.

If you haven't before, I recommend that you read through the "Expressions" chapter ignoring the semantics: look only at the grammar productions. Then you'll see just how it is that precedence and associativity are defined by the grammar. The big trick is that every "high-precedence" type of expression IS-A "lower-precedence" type of expression. So every multiplicative-expression is an additive-expression, but not vice-versa, and this is what makes multiplication "bind tighter" than addition.

Prefix unary operators are defined in the grammar like: unary-expression: ++ cast-expression and so on, with the operator on the left for prefix and on the right for postfix. In other words, we "insert the parentheses" on the left for postfix and on the right for prefix. That is, we can say that the grouping is left-to-right for postfix operators and right-to-left for prefix operators. And indeed the C++ standard says exactly that (5.2/1 and 5.3/1 in C++03). It might be an abuse of terminology or at least a new coinage to refer to this unary grouping as "associativity". But it's not a major one since it's obvious what must be meant.

The only difference here between binary and unary operators is that the syntax would still make sense if the binary operators grouped in the opposite direction, so a - b - c means a - (b - c). It would be surprising but would not otherwise affect the language. With unary operators it would be more than surprising to group !!a as (!!)a, the language would also have to supply a meaning for the sub-expression !!, which currently it doesn't have. A functional language could give it a meaning: !! might mean the function composed from ! and !, i.e. the same operation as static_cast<bool>(), but C++ has no concept of composing functions or operators. The reason C++ doesn't need to supply that meaning is that ! "groups right-to-left". Which (because of the big trick in the grammar) is just another way of saying that !! is not a syntactically correct expression so is never a sub-expression of anything.

So yes, it does make sense to say that prefix operators group right-to-left and postfix operators group left-to-right. But it's also "obvious" that it must be this way around, because of other things we know about the C++ language.

Btw, I think that technically speaking in C++ at least, postfix ++ is not a unary operator. It's a postfix operator. But that really doesn't matter except that it's the terminology in the standard, because obviously it is an operator and it has one operand, so is "unary" in English.

Combining unary operators with different precedence

Ok, let me suggest a possible erroneous grammar based on your sketch:

low_postfix:
    mid_infix
|   low_postfix "<-"
mid_infix:
    high_postfix
|   mid_infix '+' high_postfix
high_postfix:
    term
|   high_postfix "++"
term:
    ID
    '(' expr ')'

It should be clear just looking at those productions that var <- ++ is not part of the language. The only things that can be used as an operand to ++ are terms and other applications of ++. var <- is neither of these things.

On the other hand, var ++ <- is fine, because the operand to <- can be a mid_infix which can be a high_postfix which is an application of the ++ operator.

If the intention were to allow both of those postfix sequences, then that grammar is incorrect.

A version of that cascade is present in the Python grammar (albeit using prefix operators) which is why not - False is OK, but - not False is a syntax error. I'm reluctant to call that a bug because it may have been intentional. (Really, neither of those expressions makes much sense.) We could disagree about the value of such an intention but not on SO, which prefers to avoid opinionated discussions.

Note that what we might call "strict precedence" in this grammar and the Python grammar is by no means restricted to combinations of unary operators. Here's another one which you have likely never tried:

$ python3 -c 'print(41 + not False)'
  File "<string>", line 1
    print(41 + not False)
                 ^
SyntaxError: invalid syntax

So, how can we fix that?

On some level, it would be nice to be able to just write an unambiguous grammar which conveyed our intention. And it is certainly possible to write an unambiguous grammar, which would convey the intention to bison. But it's at least an open question as to whether it would convey anything to a human reader, because the massive clutter of multiple rules necessary in order to keep track of what is and is not an acceptable grouping would be pretty daunting.

On the other hand, it's dead simple to do with bison/yacc precedence declarations. We just list the operators in order, and the parser generator resolves all the ambiguities accordingly. [See Note 1 below]

Here's a similar grammar to the above, with precedence declarations. (I left the actions in place in case you want to play with it, although it's by no means a Reproducible Example; the infrastructure it relies upon is much bigger than the grammar itself, and of little use to anyone other than me. So you'll have to define the three functions and fill in some of the bison type declarations. Or just delete the AST functions and use your own.)

%left ','
%precedence "<-"                                                        
%precedence "->" 
%left '+'
%left '*'                                                               
%precedence NEG
%right "++" '('
%%
expr: expr ',' expr                { $$ = make_binop(OP_LIST, $1, $3); }
    | "<-" expr                    { $$ = make_unop(OP_LARR, $2); }
    | expr "->"                    { $$ = make_unop(OP_RARR, $1); }
    | expr '+' expr                { $$ = make_binop(OP_ADD, $1, $3); }
    | expr '*' expr                { $$ = make_binop(OP_MUL, $1, $3); }
    | '-' expr          %prec NEG  { $$ = make_unop(OP_NEG, $2); }
    | expr '(' expr ')' %prec '('  { $$ = make_binop(OP_CALL, $1, $3); }
    | "++" expr                    { $$ = make_unop(OP_PREINC, $2); }
    | expr "++"                    { $$ = make_unop(OP_POSTINC, $1); }
    | VALUE                        { $$ = make_ident($1); }
    | '(' expr ')'                 { $$ = $2; }

A couple of notes:

I used %prec NEG on the unary minus production in order to separate that production from the subtraction production. I also used a %prec declaration to modify the precedence of the call production (the default would be ')'), although in this particular case that's unnecessary. It is necessary to put '(' into the precedence list, though. ( is the lookahead symbol which is used in precedence comparisons.
For many unary operators, I used bison %precedence declaration in the precedence list, rather than %right or %left. Really, there is no such thing as associativity with unary operators, so I think that it's more self-documenting to use %precedence, which doesn't resolve conflicts involving reductions and shifts in the same precedence level. However, even though there is no such thing as associativity between unary operators, the nature of the precedence resolution algorithm is that you can put prefix operators and postfix operators in the same precedence level and choose whether the postfix or prefix operators have priority by using %right or %left, respectively. %right is almost always correct. I did that with ++, because I was a bit lazy by the time I got to that point.

This does "work" (I think). It certainly resolves all the conflicts; bison happily produces a parser without warnings. And the tests that I tried worked at least as I expected them to:

? a++->
=> [-> [++/post a]]
? a->++
=> [++/post [-> a]]
? 3*f(a)+2
=> [+ [* 3 [CALL f a]] 2]
? 3*f(a)->+2
=> [+ [-> [* 3 [CALL f a]]] 2]
? 2+<-f(a)*3
=> [+ 2 [<- [* [CALL f a] 3]]]
? 2+<-f(a)*3->
=> [+ 2 [<- [-> [* [CALL f a] 3]]]]

But there are some expressions where the operator precedence, while "correct", might not be easily explained to a novice user. For example, although the arrow operators look somewhat like parentheses, they don't group that way. Furthermore, the decision as to which of the two operators has higher precedence seems to me to be totally arbitrary (and indeed I might have done it differently from what you expected). Consider:

? <-2*f(a)->+3
=> [<- [+ [-> [* 2 [CALL f a]]] 3]]
? <-2+f(a)->*3
=> [<- [* [-> [+ 2 [CALL f a]]] 3]]
? 2+<-f(a)->*3
=> [+ 2 [<- [* [-> [CALL f a]] 3]]]

There's also something a bit odd about how the arrow operators override normal operator precedence, so that you can't just drop them into a formula without changing its meaning:

? 2+f(a)*3
=> [+ 2 [* [CALL f a] 3]]
? 2+f(a)->*3
=> [* [-> [+ 2 [CALL f a]]] 3]

If that's your intention, fine. It's your language.

Note that there are operator precedence problems which are not quite so easy to solve by just listing operators in precedence order. Sometimes it would be convenient for a binary operator to have different binding power on the left- and right-hand sides.

A classic (but perhaps controversial) case is the assignment operator, if it is an operator. Assignment must associate to the right (because parsing a = b = 0 as (a = b) = 0 would be ridiculous), and the usual expectation is that it greedily accepts as much to the right as possible. If assignment had consistent precedence, then it would also accept as much to the left as possible, which seems a bit strange, at least to me. If a = 2 + b = 7 is meaningful, my intuitions say that its meaning should be a = (2 + (b = 7)) [Note 2]. That would require differential precedence, which is a bit complicated but not unheard of. C solves this problem by restricting the left-hand side of the assignment operators to (syntactic) lvalues, which cannot be binary operator expressions. But in C++, it really does mean a = ((2 + b) = 7), which is semantically valid if 2 + b has been overloaded by a function which returns a reference.

Notes

Precedence declarations do not really add any power to the parser generator. The languages it can produce a parser for are exactly the same languages; it produces the same sort of parsing machine (a pushdown automaton); and it is at least theoretically possible to take that pushdown automaton and reverse engineer a grammar out of it. (In practice, the grammars produced by this process are usually monstrous. But they exist.)
All that the precedence declarations do is resolve parsing conflicts (typically in an ambiguous grammar) according to some user-supplied rules. So it's worth asking why it's so much simpler with precedence declarations than by writing an unambiguous grammar.
The simple hand-waving answer is that precedence rules only apply when there is a conflict. If the parser is in a state where only one action is possible, that's the action which remains, regardless of what the precedence rules might say. In a simple expression grammar, an infix operator followed by a prefix operator is not at all ambiguous: the prefix operator must be shifted, because there is no reduce action for a partial sequence ending with an infix operator.
But when we're writing a grammar, we have to specify explicitly what constructs are possible at each point in the grammar, which we usually do by defining a bunch of non-terminals, each corresponding to some parsing state. An unambiguous grammar for expressions already has split the expression non-terminal into a cascading series of non-terminals, one for each operator precedence value. But unary operators do not have the same binding power on both sides (since, as noted above, one side of the unary operator cannot take an operand). That means that a binary operator could well be able to accept a unary operator for one of its operands, and not be able to accept the same unary operator for its other operand. Which in turn means that we need to split all of our non-terminals again, corresponding to whether the non-terminal appears on the left or the right side of a binary operator.
That's a lot of work, and it's really easy to make a mistake. If you're lucky, the mistake will result in a parsing conflict; but equally it could result in the grammar not being able to recognise a particular construct which you would never think of trying, but which some irate language user feels is an absolute necessity. (Like 41 + not False)
It's possible that my intuitions have been permanently marked by learning APL at a very early age. In APL, all operators associate to the right, basically without any precedence differences.

Unary operator precedence in java

I presume it meant, that x++ is done before ++x.

No, it doesn't mean that. I've previously blogged about how I find viewing precedence in terms of execution order is problematic. It may well be valid in a computer-science-strict way, but it ends up confusing people.

I find it much easier to think about it as grouping. Likewise I think it's easiest to think of associativity as like precedence, but for operators of equal precedence. Evaluation order is always left-to-right.

So this:

int b = --var1 + ++var1 + var1++;

is grouped as if it were written:

int b = (--var1) + (++var1) + (var1++);

That's then equivalent to:

int b = ((--var1) + (++var1)) + (var1++);

(Due to binary + having left-to-right associativity.)

That's then effectively:

int tmp1 = --var1;       // tmp1 = 9, var1 = 9
int tmp2 = ++var1;       // tmp2 = 10, var1 = 10
int tmp3 = tmp1 + tmp2;  // tmp3 = 19
int tmp4 = var1++;       // tmp4 = 10, var1 = 11
int tmp5 = tmp3 + tmp4;  // tmp5 = 29

int b = tmp5;            // b = 29

... which confirms what you've observed.

Of course, you should always try to avoid having code as convoluted as this in the first place...

To demonstrate the evaluation order part, consider this expression:

int result = a() + b() * c();

Precedence means that's grouped as:

int result = a() + (b() * c());

That's equivalent to:

// Evaluate LHS of +, which is just a()
int tmp1 = a();

// Evaluate RHS of +, which is b() * c()
// First evaluate LHS of *, which is b()
int tmp2 = b(); 
// Next evaluate RHS of *, which is c()
int tmp3 = c();
// Now we can evaluate *:
int tmp4 = tmp2 * tmp3;

// Now we can evaluate +:
result = tmp1 + tmp4;

As we're executing methods, we can observe that execution order really easily:

public class Test {
    public static void main(String[] args) throws Exception {
        int result = a() + b() * c();
    }

    public static int a() {
        System.out.println("a()");
        return 3;
    }

    public static int b() {
        System.out.println("b()");
        return 4;
    }

    public static int c() {
        System.out.println("c()");
        return 5;
    }
}

That prints:

a()
b()
c()

... which confirms the execution order shown in my expansion.

If unary operators have near the highest priority, then why the order of evaluation of # and ## operators is unspecified?

Chapter 19 in the C++17 standard is titled "Preprocessing directives". It explains how the preprocessor works.

As the name suggests, the preprocessor is processed before the rest of C or C++'s rules. So operator precedence does not apply; these are not operators resulting in expressions. The preprocessing "operators" # and ## within a #define macro definition are not parts of the C or C++ language. They're parts of the C/C++ preprocessor; they are not "unary operators" as defined in section 8.3 of the C++17 standard.

During preprocessor evaluation and macro manipulation, there are no expressions. There is only a sequence of tokens, which the macro system defines a couple of transformation operators for (namely, # and ##). The grammar of C and C++ are not yet involved in the process.

So the question is moot: their evaluation order is unspecified because they have no relationship to regular C or C++ operators, and the standard says that their order is unspecified.

Operator Precedence of Unary Operators