Does Not Evaluating the Expression to Which Sizeof Is Applied Make It Legal to Dereference a Null or Invalid Pointer Inside Sizeof in C++

Does not evaluating the expression to which sizeof is applied make it legal to dereference a null or invalid pointer inside sizeof in C++?

I believe this is currently underspecified in the standard, like many issues such as What is the value category of the operands of C++ operators when unspecified?. I don't think it was intentional, like hvd points outs it is probably obvious to the committee.

In this specific case I think we have the evidence to show what the intention was. From GB 91 comment from the Rapperswil meeting which says:

It is mildly distasteful to dereference a null pointer as part of our specification, as we are playing on the edges of undefined behaviour. With the addition of the declval function template, already used in these same expressions, this is no longer necessary.

and suggested an alternate expression, it refers to this expression which is no longer in the standard but can be found in N3090:

noexcept(*(U*)0 = declval<U>())

The suggestion was rejected since this does not invoke undefined behavior since it is unevaluated:

There is no undefined behavior because the expression is an unevaluated operand. It's not at all clear that the proposed change would be clearer.

This rationale applies to sizeof as well since it's operands are unevaluated.

I say underspecified but I wonder if this is covered by section 4.1 [conv.lval] which says:

The value contained in the object indicated by the lvalue is the rvalue result. When an lvalue-to-rvalue conversion occurs
within the operand of sizeof (5.3.3) the value contained in the referenced object is not accessed, since that operator
does not evaluate its operand.

It says the value contained is not accessed, which if we follow the logic of issue 232 means there is no undefined behavior:

In other words, it is only the act of "fetching", of lvalue-to-rvalue conversion, that triggers the ill-formed or undefined behavior

This is somewhat speculative since the issue is not settled yet.

Is dereferencing null pointer valid in sizeof operation

Why does this work?

This works because sizeof is a compile time construct, with the exception of variable length arrays is not evaluated at all. If we look at the C99 draft standard section 6.5.3.4 The sizeof operator paragraph 2 says(emphasis mine):

[...] The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

we also see the following example in paragraph 5 which confirms this:

double *dp = alloc(sizeof *dp);
       ^^^                ^
                          |                                 
                          This is not the use of uninitialized pointer

At compile time the type of the expression with be determined in order to compute the result. We can further demonstrate this with the following example:

int x = 0 ;
printf("%zu\n", sizeof( x++ ));

which won't increment x, which is pretty neat.

Update

As I note in my answer to Why does sizeof(x++) not increment x? there is an exception to sizeof being a compile time operation and that is when it's operand is a variable length array(VLA). Although I did not previously point it out the quote from 6.5.3.4 above does say this.

Although in C11 as opposed to C99 it is unspecified whether sizeof is evaluated or not in this case.

Also, note there is a C++ version of this quesiton: Does not evaluating the expression to which sizeof is applied make it legal to dereference a null or invalid pointer inside sizeof in C++?.

Is sizeof(*ptr) undefined behavior when pointing to invalid memory?

In most cases, you will find that sizeof(*x) does not actually evaluate *x at all. And, since it's the evaluation (de-referencing) of a pointer that invokes undefined behaviour, you'll find it's mostly okay. The C11 standard has this to say in 6.5.3.4. The sizeof operator /2 (my emphasis in all these quotes):

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

This is identical wording to the same section in C99. C89 had slightly different wording because, of course, there were no VLAs at that point. From 3.3.3.4. The sizeof operator:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand, which is not itself evaluated. The result is an integer constant.

So, in C, for all non-VLAs, no dereferencing takes place and the statement is well defined. If the type of *x is a VLA, that's considered an execution-phase sizeof, something that needs to be worked out while the code is running - all others can be calculated at compile time. If x itself is the VLA, it's the same as the other cases, no evaluation takes place when using *x as an argument to sizeof().

C++ has (as expected, since it's a different language) slightly different rules, as shown in the various iterations of the standard:

First, C++03 5.3.3. Sizeof /1:

The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is not evaluated, or a parenthesized type-id.

In, C++11 5.3.3. Sizeof /1, you'll find slightly different wording but the same effect:

The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is an unevaluated operand (Clause 5), or a parenthesized type-id.

C++11 5. Expressions /7 (the above mentioned clause 5) defines the term "unevaluated operand" as perhaps one of the most useless, redundant phrases I've read for a while, but I don't know what was going through the mind of the ISO people when they wrote it:

In some contexts ([some references to sections detailing those contexts - pax]), unevaluated operands appear. An unevaluated operand is not evaluated.

C++14/17 have the same wording as C++11 but not necessarily in the same sections, as stuff was added before the relevant parts. They're in 5.3.3. Sizeof /1 and 5. Expressions /8 for C++14 and 8.3.3. Sizeof /1 and 8. Expressions /8 for C++17.

So, in C++, evaluation of *x in sizeof(*x) never takes place, so it's well defined, provided you follow all the other rules like providing a complete type, for example. But, the bottom line is that no dereferencing is done, which means it does not cause a problem.

You can actually see this non-evaluation in the following program:

#include <iostream>
#include <cmath>

int main() {
    int x = 42;
    std::cout << x << '\n';

    std::cout << sizeof(x = 6) << '\n';
    std::cout << sizeof(x++) << '\n';
    std::cout << sizeof(x = 15 * x * x + 7 * x - 12) << '\n';
    std::cout << sizeof(x += sqrt(4.0)) << '\n';

    std::cout << x << '\n';
}

You might think that the final line would output something vastly different to 42 (774, based on my rough calculations) because x has been changed quite a bit. But that is not actually the case since it's only the type of the expression in sizeof that matters here, and the type boils down to whatever type x is.

What you do see (other than the possibility of different pointer sizes on lines other than the first and last) is:

evaluation of expression which is used with sizeof

Your array is not a variable-length array. A variable length array is an array whose size is not a constant expression. For example, data is a variable-length array in the following:

int i = 10;
char data[i];

To see an example of a code that has sizeof evaluate its operand, try something like this:

#include <stdio.h>

int main(void)
{
    int i = 41;
    printf("i: %d\n", i);
    printf("array size: %zu\n", sizeof (char[i++]));
    printf("i now: %d\n", i);
    return 0;
}

It prints:

i: 41
array size: 41
i now: 42

Is the operand of `sizeof` evaluated with a VLA?

Yes, this causes undefined behaviour.

In N1570 6.5.3.4/2 we have:

The sizeof operator yields the size (in bytes) of its operand, which may be an
expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

Now we have the question: is the type of *bar a variable length array type?

Since bar is declared as pointer to VLA, dereferencing it should yield a VLA. (But I do not see concrete text specifying whether or not it does).

Note: Further discussion could be had here, perhaps it could be argued that *bar has type double[100] which is not a VLA.

Supposing we agree that the type of *bar is actually a VLA type, then in sizeof *bar, the expression *bar is evaluated.

bar is indeterminate at this point. Now looking at 6.3.2.1/1:

if an lvalue does not designate an object when it is evaluated, the
behavior is undefined

Since bar does not point to an object (by virtue of being indeterminate), evaluating *bar causes undefined behaviour.

Is sizeof(void()) a legal expression?

void() is a function type (it's a function which takes no arguments and returns nothing), so it's not a valid type in sizeof().

sizeof(2147483648) is 8 bytes while sizeof(2147483647+1) is 4 bytes

The rules of integer constant in C is that an decimal integer constant has the first type in which it can be represented to in: int, long, long long.

2147483648

does not fit into an int into your system (as the maximum int in your system is 2147483647) so its type is a long (or a long long depending on your system). So you are computing sizeof (long) (or sizeof (long long) depending on your system).

2147483647

is an int in your system and if you add 1 to an int it is still an int. So you are computing sizeof (int).

~~Note that sizeof(2147483647+1) invokes undefined behavior in your system as INT_MAX + 1 overflows and signed integer overflows is undefined behavior in C.~~

Note that while generally 2147483647+1 invokes undefined behavior in your system (INT_MAX + 1 overflows and signed integer overflows is undefined behavior in C), sizeof(2147483647+1) does not invoke undefined behavior as the operand of sizeof in this case is not evaluated.

Is `sizeof(T)` with an incomplete type a valid substitution-failure as per the C++ Standard?

"Shall not be applied" means that it would normally be ill-formed. In an SFINAE context, if something would normally be ill-formed due to resulting in "an invalid type or expression", this becomes a substitution failure, as long as it is in the "immediate context" (C++20 [temp.deduct]/8) and not otherwise excluded from SFINAE (e.g. see p9 regarding lambda expressions).

There is no difference between "invalid" and "ill-formed" in this context. p8 explicitly says: "An invalid type or expression is one that would be ill-formed, with a diagnostic required, if written using the substituted arguments." This wording has been present since C++11. However, in C++03, invalid expressions were not substitution failures. This is the famous "expression SFINAE" feature that was added in C++11, after compiler implementers were sufficiently convinced that they would be able to implement it.

There is no rule in the standard that says that sizeof expressions are an exception to the SFINAE rules, so as long as an invalid sizeof expression occurs in the immediate context, SFINAE applies.

The "immediate context" has still not been explicitly defined in the standard. An answer by Jonathan Wakely, a GCC dev, explains the intent. Eventually, someone might get around to formally defining it in the standard.

However, the case of incomplete types, the problem is that this technique is very dangerous. First, if the completeness check is performed twice in the same translation unit on the same type, the instantiation is only performed once; this implies that the second time it's checked, the result of the check will still be false, because the is_type_complete_v<T> will simply refer to the previous instantiation. Chen's post appears to simply be wrong about this: GCC, Clang, and MSVC all behave the same way. See godbolt. It's possible that the behaviour was different on an older version of MSVC.

Second, if there is cross-translation-unit variance: that is, is_type_complete_v<T> is instantiated in one translation unit and is false, and is instantiated in another translation unit and is true there, the program is ill-formed NDR. See C++20 [temp.point]/7.

For this reason, completeness checks are generally not done; instead, library implementers either say that you are allowed to pass incomplete types to their templates and they will work properly, or that you must pass a complete type but the behaviour is undefined if you violate this requirement, as it cannot be reliably checked at compile time.

One creative way around the template instantiation rules is to use a macro with __COUNTER__ to make sure that you have a fresh instantiation every time you use the type trait, and you have to define the is_type_complete_v template with internal linkage, to avoid the issue of cross-TU variance. I got this technique from this answer. Unfortunately, __COUNTER__ is not in standard C++, but this technique should work on compilers that support it.

(I looked into whether the C++20 source_location feature can replace the non-standard __COUNTER__ in this technique. I think it can't, because IS_COMPLETE may be referenced from the same line and column but within two different template instantiations that somehow both decide to check the same type, which is incomplete in one and complete in the other.)

Is sizeof new int; undefined behavior?

The warning doesn't state that it's UB; it merely says that the context of use, namely sizeof, won't trigger the side effects (which in case of new is allocating memory).

[expr.sizeof]
The sizeof operator yields the number of bytes occupied by a non-potentially-overlapping object of the type of its operand. The operand is either an expression, which is an unevaluated operand ([expr.prop]), or a parenthesized type-id.

The standard also helpfully explains what that means:

[expr.context] (...) An unevaluated operand is not evaluated.

It's a fine, although a weird way to write sizeof(int*).

Does Not Evaluating the Expression to Which Sizeof Is Applied Make It Legal to Dereference a Null or Invalid Pointer Inside Sizeof in C++