Pointer to Array of Unspecified Size "(*P)[]" Illegal in C++ But Legal in C

Pointer to array of unspecified size (*p)[] illegal in C++ but legal in C

Dan Saks wrote about this in 1995, during the lead up to C++ standardisation:

The committees decided that functions such as this, that accept a
pointer or reference to an array with unknown bound, complicate
declaration matching and overload resolution rules in C++. The
committees agreed that, since such functions have little utility and
are fairly uncommon, it would be simplest to just ban them. Hence, the
C++ draft now states:

If the type of a parameter includes a type of the form pointer to
array of unknown bound of T or reference to array of unknown bound of
T, the program is ill-formed.

Is it legal to treat a pointer like an array?

First things first, and being completely blunt:

Your mental model is wrong! It is imperative, that you're correcting your misconceptions now, before you're in too deep.

char* s copies an array,

This is a misconception. s is a pointer to a char. It could be a single char or a whole array. The exact type of the underlying object is lost when taking an address.

Nothing is copied, though! It's just a pointer to "wherever" (waves around with arms) and everyone involved (you, the compiler, other programmers) are in an unspoken and unwritten agreement to be nice and not doing something stupid. Like passing in a pointer that later in the function will be used in an invalid way.

this is ok because an array name is it's first element memory address.

Arrays don't have names! Symbols do. The symbol to an array will decay to a pointer to the elementary type of which the array is made from. This decay is why you can write char somearray[123]; char *p = somearray without taking its address.

why do we treat the pointer s as an array in the for cycle?

Because we can. More specifically because of this thing called "pointer arithmetic". The expession s + 1 will result in a pointer that points one element past the address of the element the pointer is pointing to. It works for any number (within the value range of ptrdiff_t).

When you write a_pointer[i] in C, it literally translates (that's not hyperbole, the C standard requires it to be treated by the compiler being done like that!) into *(a_pointer + i). So what happens is that by writing a_pointer[i] you're telling the compiler: *"assume that a_pointer points into an array object and that a_pointer + i is still inside the bounds of that array object: With that assumption, dereference that location and produce the value there."

However the results of pointer arithmetic are defined only, if the resulting pointer stays within the bounds of an object.

Do pointer arithmetic on a pointer that's not taken from an array? Undefined!

Generate a pointer that's outside the bounds of an array? Undefined!

My problem is that I consider them "as an int variable",

They're not! Technically pointers may be implemented by unicorn dust and magic. There are a few very specific rules to them, when it comes to intermingling them with numbers. In the C programming language these rules are (simplified):

  • Pointers can be translated into integers of size sizeof(uintptr_t) and vice versa.

  • The numeric value 0 translates to the null pointer, and null pointers translate to the numeric value 0.

  • Null pointers are invalid and hence must not be dereferenced.

  • Pointers can be subtracted from each other, resulting in an integer compatible to ptrdiff_t, and the value of the resulting integer is the distance in elements between these two pointers, assuming that both pointers refer to the same object. Written in "types" ⟪ptrdiff_t⟫ = ⟪pointer A⟫ - ⟪pointer B⟫, only arithmetic valid rearrangements of this are valid.

  • You can't add pointers

  • You can't multiply pointers

  • There is no mandate that number representations of pointers can be used for pointer arithmetic. I.e. you must not assume that (pointer_A - pointer_B) == k*((uintptr_t)pointer_A - (uintptr_t)pointer_B)) for any value of k.

since memory address are integers in an hexadecimal format (right?),

Huh?!? This is not how things work.

Yes, you can use integers to address memory location. No, you don't have to write them as hexadecimals. Hexadecimal is just a different number base and 0xF == 15 = 0o17 == 0b1111. These days we usually write addresses in hexadecimal because it nicely aligns with our current computer architectures' word sizes being powers of 2. One hexadecimal digit equals 4 bits. But there are other architectures that use different word sizes and on those other number bases are better suited.

And that still assumes linear address spaces. There are however also computer architectures that support segmented address spaces. As a matter of fact, it is very likely that the machine you're reading this on is such a computer. If it's using a CPU made by Intel or AMD, this thing actually understands segmented addresses https://en.wikipedia.org/wiki/X86_memory_segmentation

In x86 segmented address space an address actually consists of two numbers, i.e. it forms a vector. Which means if you're compiling a C program to run in a segmented address space environment pointer types no longer will be simple singular value numbers. C still requires them to be translatable to uintptr_t though, ponder on that!

Pointer casting with unknown array size in C++

First I suggest to don't use runtime size for array in C/C++, except you using STL vector as an array. so instead of:

int i = 5;

you must use:

const int i = 5;

except you use Vector that is safe and better than intrinsic arrays.

how can I cast void pointer to a 2d array (array of pointers to arrays of ints), when I dont know array size at compile time? Is it somehow possible?

If we talk about C intrinsic array, It is not possible!

why it is not possible?
because C/C++ compiler not aware of your the array size, borders,.... so if you cast your 2d array to 1d array, it is possible. it is the reason that tab2 array can access to first 5th element of your array. really C/C++ compiler cannot distinguish the different of

int a[3][3]

with

int a[3*3]

so You must be aware of at least one dimension of your array:

int main() {
const int i = 3,j = 4;

int tab1[i][j] = {1,2,3,4,5,6,7,8,9,10,11};

//cast to void pointer
void *p = (void *)tab1;

auto a = (int (*)[i][12/i])p;
return 0;
}

In the above example, I aware about i and total count(12) and I calculate the second dimension.
I use auto keyword that very easily inferred the data type.

Const correctness for array pointers?

There is no way to do it except for the cast. This is significant drawback of the idea to pass arrays in this way.

Here is a similar thread where the C rules are compared to the C++ rules. We could conclude from this comparison that the C rules are not so well designed, because your use case is valid but C doesn't allow the implicit conversion. Another such example is conversion of T ** to T const * const *; this is safe but is not allowed by C.

Note that since n is not a constant expression, then int n, int (*arr)[n] does not have any added type safety compared to int n, int *arr. You still know the length (n), and it is still silent undefined behaviour to access out of bounds, and silent undefined behaviour to pass an array that is not actually length n.

This technique has more value in the case of passing non-VLA arrays , when the compiler must report if you pass a pointer to an array of the wrong length.

C: Allowed to assign any array to pointer to array of incomplete type

The initialization in char (*Durr)[] = &Arr; requires Durr pointing to an array of type compatible with the type of Arr.

According to "6.7.6.2 Array declarators" (n1570)

6 For two array types to be compatible, both shall have compatible element types, and if
both size specifiers are present, and are integer constant expressions, then both size
specifiers shall have the same constant value.

Because the array pointed to by Durr has an incompleted type, which implies that those two types should be compatible, then compiler should not give error/warning for this initialization.

Why is undefined size array in struct allowed?

The later is called a "flexible array member" which is special case for structures. The last member of a struct is allowed to have no size.

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.

Also see example 20.

The former is a normal array and it's not allowed to have zero size. See 6.7.6.2 Array declarators.

If they delimit an expression (which specifies the size of an array), the expression shall have an integer type. If the expression is a constant expression, it shall have a value greater than zero.
In other words, the language standard says so.

Why are arrays of references illegal?

Answering to your question about standard I can cite the C++ Standard §8.3.2/4:

There shall be no references to references, no arrays of references, and no pointers to references.

That's because references are not objects and doesn't occupy the memory so doesn't have the address. You can think of them as the aliases to the objects. Declaring an array of nothing has not much sense.

Is &arr[size] valid?

It's not valid. You bolded "result is an lvalue referring to the object or function to which the expression points" in your question. That's exactly the problem. array + size is a valid pointer value that does not point to an object. Therefore, your quote about *(array + size) does not specify what the result refers to, and that then means there is no requirement for &*(array + size) to give the same value as array + size.

In C, this was considered a defect and fixed so that the spec now says in &*ptr, neither & nor * gets evaluated. C++ hasn't yet received fixed wording. It's the subject of a very old still active DR: DR #232. The intent is that it is valid, just as it is in C, but the standard doesn't say so.



Related Topics



Leave a reply



Submit