Is Sizeof(*Ptr) Undefined Behavior When Pointing to Invalid Memory

Is sizeof(*ptr) undefined behavior when pointing to invalid memory?

In most cases, you will find that sizeof(*x) does not actually evaluate *x at all. And, since it's the evaluation (de-referencing) of a pointer that invokes undefined behaviour, you'll find it's mostly okay. The C11 standard has this to say in 6.5.3.4. The sizeof operator /2 (my emphasis in all these quotes):

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

This is identical wording to the same section in C99. C89 had slightly different wording because, of course, there were no VLAs at that point. From 3.3.3.4. The sizeof operator:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand, which is not itself evaluated. The result is an integer constant.

So, in C, for all non-VLAs, no dereferencing takes place and the statement is well defined. If the type of *x is a VLA, that's considered an execution-phase sizeof, something that needs to be worked out while the code is running - all others can be calculated at compile time. If x itself is the VLA, it's the same as the other cases, no evaluation takes place when using *x as an argument to sizeof().


C++ has (as expected, since it's a different language) slightly different rules, as shown in the various iterations of the standard:

First, C++03 5.3.3. Sizeof /1:

The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is not evaluated, or a parenthesized type-id.

In, C++11 5.3.3. Sizeof /1, you'll find slightly different wording but the same effect:

The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is an unevaluated operand (Clause 5), or a parenthesized type-id.

C++11 5. Expressions /7 (the above mentioned clause 5) defines the term "unevaluated operand" as perhaps one of the most useless, redundant phrases I've read for a while, but I don't know what was going through the mind of the ISO people when they wrote it:

In some contexts ([some references to sections detailing those contexts - pax]), unevaluated operands appear. An unevaluated operand is not evaluated.

C++14/17 have the same wording as C++11 but not necessarily in the same sections, as stuff was added before the relevant parts. They're in 5.3.3. Sizeof /1 and 5. Expressions /8 for C++14 and 8.3.3. Sizeof /1 and 8. Expressions /8 for C++17.

So, in C++, evaluation of *x in sizeof(*x) never takes place, so it's well defined, provided you follow all the other rules like providing a complete type, for example. But, the bottom line is that no dereferencing is done, which means it does not cause a problem.

You can actually see this non-evaluation in the following program:

#include <iostream>
#include <cmath>

int main() {
int x = 42;
std::cout << x << '\n';

std::cout << sizeof(x = 6) << '\n';
std::cout << sizeof(x++) << '\n';
std::cout << sizeof(x = 15 * x * x + 7 * x - 12) << '\n';
std::cout << sizeof(x += sqrt(4.0)) << '\n';

std::cout << x << '\n';
}

You might think that the final line would output something vastly different to 42 (774, based on my rough calculations) because x has been changed quite a bit. But that is not actually the case since it's only the type of the expression in sizeof that matters here, and the type boils down to whatever type x is.

What you do see (other than the possibility of different pointer sizes on lines other than the first and last) is:

42
4
4
4
4
42

Is storing an invalid pointer automatically undefined behavior?

I have the C Draft Standard here, and it makes it undefined by omission. It defines the case of ptr + I at 6.5.6/8 for

  • If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression.
  • Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object.

Your case does not fit any of these. Neither is your array large enough to have -1 adjust the pointer to point to a different array element, nor does any of the result or original pointer point one-past-end.

Does sizeof(* struct pointer) give you the value of structure

I tried this piece of code in C using gcc compiler and that gave me the size of the structure itself

As it should. The argument to the unary sizeof operator can be either an expression or a parenthesized type name. All expressions in C have a type. And what sizeof is meant do is to look at the type the expression would evaluate to, and give the size of that type.

Since *ptr is an expression of the type struct sample, the result should be the size of struct sample. It's also worth mentioning that generally the expression doesn't even need to be evaluated, the size can be determined statically (unless you are dealing with a VLA).

You can rely on that behavior, because that's what the C standard specifies it should do. It will always give the size of the structure. You can't rely on the size being the same across compilers, but you can be sure of the behavior sizeof will have.

Since your own example uses pointers, one place where this behavior can be useful is described in "Do I cast the result of malloc?", where you'd feed malloc the size of the dereferenced pointer, instead of the type:

struct sample * ptr = malloc(sizeof *ptr);

The rationale being that you don't need to repeat the type name twice (thrice in C++, where you'd need a cast as well), helping you to avoid subtle mistakes that may come when refactoring.

Does not evaluating the expression to which sizeof is applied make it legal to dereference a null or invalid pointer inside sizeof in C++?

I believe this is currently underspecified in the standard, like many issues such as What is the value category of the operands of C++ operators when unspecified?. I don't think it was intentional, like hvd points outs it is probably obvious to the committee.

In this specific case I think we have the evidence to show what the intention was. From GB 91 comment from the Rapperswil meeting which says:

It is mildly distasteful to dereference a null pointer as part of our specification, as we are playing on the edges of undefined behaviour. With the addition of the declval function template, already used in these same expressions, this is no longer necessary.

and suggested an alternate expression, it refers to this expression which is no longer in the standard but can be found in N3090:

noexcept(*(U*)0 = declval<U>())

The suggestion was rejected since this does not invoke undefined behavior since it is unevaluated:

There is no undefined behavior because the expression is an unevaluated operand. It's not at all clear that the proposed change would be clearer.

This rationale applies to sizeof as well since it's operands are unevaluated.

I say underspecified but I wonder if this is covered by section 4.1 [conv.lval] which says:

The value contained in the object indicated by the lvalue is the rvalue result. When an lvalue-to-rvalue conversion occurs
within the operand of sizeof (5.3.3) the value contained in the referenced object is not accessed, since that operator
does not evaluate its operand.

It says the value contained is not accessed, which if we follow the logic of issue 232 means there is no undefined behavior:

In other words, it is only the act of "fetching", of lvalue-to-rvalue conversion, that triggers the ill-formed or undefined behavior

This is somewhat speculative since the issue is not settled yet.

sizeof(*p) are my results undefined?

cout << sizeof(p) / sizeof(*p) << endl;

is perfectly legal and but not exactly valid code. sizeof is evaluated at compile time. If they can't be evaluated at compile time, you will get compiler error.

The surprise will be the expected result. You won't get 10. To get 10, you'll need to use:

cout << sizeof(x) / sizeof(x[0]) << endl;

Need assistance in understanding this code using malloc and pointers

I'm assuming we're talking about C here. The answer is different for C++.

1) is entirely off. ptr is a pointer to an int, that's all. It's uninitialized, so it has no deterministic value. Dereferencing it is undefined behaviour -- you will most certainly not get 0 out! The pointer also will most likely not point to 0. The size of ptr is sizeof(ptr), or sizeof(int*); nothing else. (At best you know that this is no larger than sizeof(void*).)

2/3) In C, never cast the result of malloc: int * p = malloc(sizeof(int) * 10);. The code allocates enough memory for 10 integers, i.e. 10 times the size of a single integer; the return value of the call is a pointer to that memory.

Confusion regarding pointer size

Object pointers (e.g. pointers to anything besides a function) are typically the same size on most systems you're likely to come across, however there's no guarantee of that. That being said, even though the pointers may be the same size, the types that they point to are not.

For example, on a 64-bit Windows system, pointers are typically 8 bytes in size. In your example you have char * and an int * which are most likely both 8 bytes. The difference here is that dereferencing a char * will read/write 1 byte while dereferenceing an int * will read/write 4 bytes (assuming an int is 32 bit).

Assuming little endian byte ordering, a looks like this in memory:

  ------------------
a | 44 | 1 | 0 | 0 |
------------------

Both ptr and intptr contain the address of a. When dereferencing ptr, which is of type char *, it only looks at the first byte. In contrast, when dereferencing intptr, which is of type int *, it looks at all 4 bytes.

Undefined behavior with pointer arithmetic on dynamically allocated memory

From C99, 7.20.3 - Memory management functions (emphasis mine) :

The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).

This implies that the allocated memory can be accessed as an array of char (as per your example), and so pointer arithmetic is well defined.



Related Topics



Leave a reply



Submit