Is It Legal to Index into a Struct

Is it legal to index into a struct?

It is illegal 1. That's an Undefined behavior in C++.

You are taking the members in an array fashion, but here is what the C++ standard says (emphasis mine):

[dcl.array/1]: ...An object of array type contains a contiguously allocated non-empty set of N
subobjects of type T...

But, for members, there's no such contiguous requirement:

[class.mem/17]: ...;Implementation alignment requirements might cause two adjacent
members not to be allocated immediately after each other
...

While the above two quotes should be enough to hint why indexing into a struct as you did isn't a defined behavior by the C++ standard, let's pick one example: look at the expression (&thing.a)[2] - Regarding the subscript operator:

[expr.post//expr.sub/1]:
A postfix expression followed by an expression in square brackets is a
postfix expression. One of the expressions shall be a glvalue of type
“array of T” or a prvalue of type “pointer to T” and the other shall
be a prvalue of unscoped enumeration or integral type. The result is
of type “T”. The type “T” shall be a completely-defined object type.66
The expression E1[E2] is identical (by definition) to ((E1)+(E2))

Digging into the bold text of the above quote: regarding adding an integral type to a pointer type (note the emphasis here)..

[expr.add/4]: When an expression that has integral type is added to or subtracted from a
pointer, the result has the type of the pointer operand. If the
expression P points to element x[i] of an array object x
with n elements, the expressions P + J and J + P (where J has
the value j) point to the (possibly-hypothetical) element x[i + j]
if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. ...

Note the array requirement for the if clause; else the otherwise in the above quote. The expression (&thing.a)[2] obviously doesn't qualify for the if clause; Hence, Undefined Behavior.


On a side note: Though I have extensively experimented the code and its variations on various compilers and they don't introduce any padding here, (it works); from a maintenance view, the code is extremely fragile. you should still assert that the implementation allocated the members contiguously before doing this. And stay in-bounds :-). But its still Undefined behavior....

Some viable workarounds (with defined behavior) have been provided by other answers.



As rightly pointed out in the comments, [basic.lval/8], which was in my previous edit doesn't apply. Thanks @2501 and @M.M.

1: See @Barry's answer to this question for the only one legal case where you can access thing.a member of the struct via this parttern.

referencing struct fields in c with square brackets and an index instead of . and - ?

I got pretty close with this construct:

((char**)&tw)[0];

As an example:

int main()
{
typedef struct
{
char * string1;
char * string2;
} TWO_WORDS;

TWO_WORDS tw = {"Hello", "World"};

printf("String1: %s\n", ((char**)&tw)[0]);
printf("String2: %s\n", ((char**)&tw)[1]);

return 0;
}

It is not guaranteed to work, as the compiler may add padding between fields. (Many compilers have a #pragma that will avoid padding of structs)

To answer each of your questions:

  • is this part of the c standard? NO

  • do i have to cast the struct to an array first? YES

  • what about fields which are different sizes in memory
    This can be done with even more "evil" casting and pointer-math

  • what about fields which are different types but the same size?
    This can be done with even more "evil" casting and pointer-math

  • can you do pointer arithmetic within a structure?
    Yes (not guaranteed to always work as you might expect, but a structure is just a piece of memory that you can access with pointers and pointer-math)

In a structure, is it legal to use one array field to access another one?

Would it be legal to write s.a[6] and expect it to be equal to s.b[2]?

No. Because accessing an array out of bound invoked undefined behaviour in C and C++.

C11 J.2 Undefined behavior

  • Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond
    the array object and is used as the operand of a unary * operator that
    is evaluated (6.5.6).

  • An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression
    a[1][7] given the declaration int a[4][5]) (6.5.6).

C++ standard draft section 5.7 Additive operators paragraph 5 says:

When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integral expression.
[...] If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.

Is it legal to overrun one element of a struct to view another?

EDIT: As pointed out by others, this is not legal, as it results in undefined behaviour. I've removed this sentence from my answer.

This has the potential to result in undefined behaviour. You've allocated a memory chunk of 10 ints long in the struct abc, so indexing into the 5th (6th) item will take you to y[0] as you've noted in THIS specific case.

Where you can run into problems is when the C compiler packs the structure in a way that you do not expect. This is called data packing or bit alignment. When the computer wants to access memory from your data structure, it will attempt to do so in uniform chunks for the entire structure. Let's use an example:

struct abc {
int a;
char b;
int c;
};

What do you expect the size of this struct to be? An int is 32 bits, and a char is 8 bits, so the total size should be 32 + 8 + 32 = 72 bits. However, you will find that on many systems, this structure is actually 96 bits in size. The reason is that char b gets bit packed on the end with an additional 24 bits to maintain a standard offset between variables.

This can be extremely confusing when you declare a structure in two different places, and one gets bit packed while the other does not due to compile time options or configuration.

Look up bit packing and data alignment or bit alignment for more information.

Is it legal to access struct members via offset pointers from other struct members?

Introduction: The standard is inadequate in this area, and there is decades of history of argument on this topic and strict aliasing with no convincing resolution or proposal to fix.

This answer reflects my view rather than any imposition of the Standard.


Firstly: it's generally agreed that the code in your first code sample is undefined behaviour due to accessing outside the bounds of an array via direct pointer arithmetic.

The rule is C11 6.5.6/8 . It says that indexing from a pointer must remain within "the array object" (or one past the end). It doesn't say which array object but it is generally agreed that in the case int *p = &foo.a; then "the array object" is foo.a, and not any larger object of which foo.a is a subobject.

Relevant links:
one, two.


Secondly: it's generally agreed that both of your union examples are correct. The standard explicitly says that any member of a union may be read; and whatever the contents of the relevant memory location are are interpreted as the type of the union member being read.


You suggest that the union being correct implies that the first code should be correct too, but it does not. The issue is not with specifying the memory location read; the issue is with how we arrived at the expression specifying that memory location.

Even though we know that &foo.a + 1 and &foo.b are the same memory address, it's valid to access an int through the second and not valid to access an int through the first.

It's generally agreed that you can access the int by computing its address in other ways that don't break the 6.5.6/8 rule, e.g.:

((int *)((char *)&foo + offsetof(foo, b))[0]

or

((int *)((uintptr_t)&foo.a + sizeof(int)))[0]

Relevant links: one, two


It's not generally agreed on whether ((int *)&foo)[1] is valid. Some say it's basically the same as your first code, since the standard says "a pointer to an object, suitably converted, points to the element's first object". Others say it's basically the same as my (char *) example above because it follows from the specification of pointer casting. A few even claim it's a strict aliasing violation because it aliases a struct as an array.

Maybe relevant is N2090 - Pointer provenance proposal. This does not directly address the issue, and doesn't propose a repeal of 6.5.6/8.



Related Topics



Leave a reply



Submit