Why Do C and C++ Support Memberwise Assignment of Arrays Within Structs, But Not Generally

Why do C and C++ support memberwise assignment of arrays within structs, but not generally?

Here's my take on it:

The Development of the C Language offers some insight in the evolution of the array type in C:

  • http://cm.bell-labs.com/cm/cs/who/dmr/chist.html

I'll try to outline the array thing:

C's forerunners B and BCPL had no distinct array type, a declaration like:

auto V[10] (B)
or
let V = vec 10 (BCPL)

would declare V to be a (untyped) pointer which is initialized to point to an unused region of 10 "words" of memory. B already used * for pointer dereferencing and had the [] short hand notation, *(V+i) meant V[i], just as in C/C++ today. However, V is not an array, it is still a pointer which has to point to some memory. This caused trouble when Dennis Ritchie tried to extend B with struct types. He wanted arrays to be part of the structs, like in C today:

struct {
int inumber;
char name[14];
};

But with the B,BCPL concept of arrays as pointers, this would have required the name field to contain a pointer which had to be initialized at runtime to a memory region of 14 bytes within the struct. The initialization/layout problem was eventually solved by giving arrays a special treatment: The compiler would track the location of arrays in structures, on the stack etc. without actually requiring the pointer to the data to materialize, except in expressions which involve the arrays. This treatment allowed almost all B code to still run and is the source of the "arrays convert to pointer if you look at them" rule. It is a compatiblity hack, which turned out to be very handy, because it allowed arrays of open size etc.

And here's my guess why array can't be assigned: Since arrays were pointers in B, you could simply write:

auto V[10];
V=V+5;

to rebase an "array". This was now meaningless, because the base of an array variable was not a lvalue anymore. So this assigment was disallowed, which helped to catch the few programs that did this rebasing on declared arrays. And then this notion stuck: As arrays were never designed to be first class citized of the C type system, they were mostly treated as special beasts which become pointer if you use them. And from a certain point of view (which ignores that C-arrays are a botched hack), disallowing array assignment still makes some sense: An open array or an array function parameter is treated as a pointer without size information. The compiler doesn't have the information to generate an array assignment for them and the pointer assignment was required for compatibility reasons. Introducing array assignment for the declared arrays would have introduced bugs though spurious assigments (is a=b a pointer assignment or an elementwise copy?) and other trouble (how do you pass an array by value?) without actually solving a problem - just make everything explicit with memcpy!

/* Example how array assignment void make things even weirder in C/C++, 
if we don't want to break existing code.
It's actually better to leave things as they are...
*/
typedef int vec[3];

void f(vec a, vec b)
{
vec x,y;
a=b; // pointer assignment
x=y; // NEW! element-wise assignment
a=x; // pointer assignment
x=a; // NEW! element-wise assignment
}

This didn't change when a revision of C in 1978 added struct assignment ( http://cm.bell-labs.com/cm/cs/who/dmr/cchanges.pdf ). Even though records were distinct types in C, it was not possible to assign them in early K&R C. You had to copy them member-wise with memcpy and you could pass only pointers to them as function parameters. Assigment (and parameter passing) was now simply defined as the memcpy of the struct's raw memory and since this couldn't break exsisting code it was readily adpoted. As a unintended side effect, this implicitly introduced some kind of array assignment, but this happended somewhere inside a structure, so this couldn't really introduce problems with the way arrays were used.

Why struct assignment works with arrays in structs

Why after years of C programming am I having an existential language crisis about a mechanism I have used but never understood?

You always misunderstood arrays and now this has brought it to light :)

The actual rules are:

  1. Arrays are different to pointers; there is no "implied pointer" or anything in an array. The storage in memory for an array consists of exactly the cells with the array contents and nothing more.

  2. When you use the array's identifier in an expression, then the value of that expression is a (temporary) pointer to the array's first element. (With a handful of exceptions that I omit for brevity).

    2a. (in case this was unclear) Expressions have values , and the value of an expression does not require storage. For example in the code f(1 + 1), the value 2 is a value but it is not in an object and, conceptually, it is not stored anywhere. The pointer mentioned above is the same sort of value.

The reason you cannot write:

data2 = data;

is because Rule 2 kicks in , the value of the right-hand side is a pointer, and the assignment operation is not defined between an array and a pointer. (It wouldn't know how many units to copy).

The language designers could have added another exception to Rule 2 so that if the array is the sole right-hand operand of = then value conversion doesn't occur, and the array is assigned by value. That would be a consistent rule and the language would work. But they didn't.

The structure assignment does not trigger Rule 2 so the array is happily copied.

In fact they could have done away with Rule 2 entirely, and the language would still have worked. But then you would need to write puts(&s[0]); instead of puts(s); and so on. When designing C (incorporating BCPL which I think had a similar rule) , they opted to go for including Rule 2, presumably because the benefits appeared to outweigh the negatives at the time.

why for structure with flexible array member, we shouldn't use structure assignment for copying

The size of the structure doesn’t count the FAM. The compiler has no way to know how big the FAM is. Consequently, any copy ignores the FAM. Since that’s very seldom the desired behaviour, don’t use structure copying on structures with a FAM.

Why assigning an array to a matrix is illegal?

The explanation is very simple.

Two main reasons:

  1. In the C you cant assign arrays.
    The only complex types where assignment is allowed in C are structs and unions.
  2. The initialization of the global variables requires constant expression.

Why can I assign structs but not compare them

Per the comp.lang.c FAQ:

There is no good way for a compiler
to implement structure comparison
(i.e. to support the == operator for
structures) which is consistent with
C's low-level flavor. A simple
byte-by-byte comparison could founder
on random bits present in unused
"holes" in the structure (such
padding is used to keep the alignment
of later fields correct). A field-by-field comparison might require unacceptable amounts of
repetitive code for large structures.
Any compiler-generated comparison
could not be expected to compare
pointer fields appropriately in all
cases: for example, it's often
appropriate to compare char * fields
with strcmp rather than ==.

If you need to compare two structures,
you'll have to write your own function
to do so, field by field.

Why doesn't this array of structure assignment compile?

For assignment, typecast the value to type Point to make it a compound literal:

pt[0] = (Point){100,200};

Live code using gcc

This is equivalent to

{
Point temp = {100,200};
pt[0] = temp;
}

p.s. Compound literal is not available in old strict C89 compliant compiler. It is avilable in GCC for C89 as extension and in C99 compound literal is a core feature.

Passing arrays within structure

Both approaches work: passing full array or struct object, and passing addresses only. But passing a pointer is less costly and more flexible than passing the object in its entirety.

Passing pointers is:

Less costly because when passing the full array, or struct variable the entire memory content for either is copied to a new location on the stack. And because typically struct and array variable are created as collections of large amounts related data, the cost benefit can be substantial. The size of data copied will determine how long it takes to copy, and how much memory will be used to accommodate.

Passing a pointer to either data type, no matter how much data the variable is defined to contain, will only cost the size of a pointer. If targeting 32bit addressing, the size of a pointer variable will be 4 bytes. If targeting 64bit addressing, then the cost is 8 bytes.

More flexible because, for these data types in particular, designing your code to pass pointers adds the ability to add struct members, or array elements without impacting the prototype of the functions that accept them as arguments. For example, the following function prototype will accept both of the following struct definitions:

void acceptStructPointer(S *data);

Will accept either struct definition without impact:

typedef struct {
int val[10];
}S;

Or:

typedef struct {
int val[10];
float b[100];
char string[100];
}S;

Additionally, when memory needs are not know until run-time, for example when reading from a data base, or when spawning multiple instances of socket sessions, passing pointers means that memory needs can be sized based on actual run-time needs:

void acceptStructPointer(S *data)
{
...
data = malloc(someDemand*sizeof(S));
if(data)
{
....

The following is a small code snippet showing in particular the size/speed advantage of passing pointers. Note that the larger, and/or more complex the data object, the bigger the advantage becomes in terms of run-time speed and memory usage.

#define ARY_SIZE 10

typedef struct {
int val[10];
}S;
//struct
S sData = {1,2,3,4,5,6,7,8,9,0};
//pointer to struct
S *pSdata = NULL;

//array
int aData[ARY_SIZE] = {9,8,7,6,5,4,3,2,1,0};
//pointer to array
int *pAdata = NULL;

void acceptPointerVaraibles(S *pA, int *pD);
void acceptNonPointerVariables(S a, int d[]);

int main(void)
{
pSdata = &sData;
pAdata = &aData[0];

printf("Size of struct sData: %d\n", sizeof(sData));
printf("Size of struct pSdata: %d\n", sizeof(pSdata));
printf("Size of struct aData: %d\n", sizeof(aData));
printf("Size of struct pAdata: %d\n", sizeof(pAdata));


//passing pointer
acceptPointerVaraibles(pSdata, pAdata);
//passing non pointer
acceptNonPointerVariables(sData, aData);
return 0;
}

void acceptPointerVaraibles(S *pA, int *pD)
{
for(int i=0;i<ARY_SIZE;i++)
{
printf("Value of struct val element %d: %d\n", i, pA->val[i]);
printf("Value of array element %d: %d\n", i, pD[i]);
}
return;
}


void acceptNonPointerVariables(S a, int d[])
{
for(int i=0;i<ARY_SIZE;i++)
{
printf("Value of struct val element %d: %d\n", i, a.val[i]);
printf("Value of array element %d: %d\n", i, d[i]);
}
return;
}

Are Structs with Struct-Arrays value or reference-based?

A structure is a value type and a class is a reference type. That never changes.

If you have a local variable in a method in your code, when that code is executed, space is allocated for that variable on the stack. If that variable is a value type then the structure instance will be stored in the variable itself while, if the variable is a reference type, space will be allocated for the object on the heap and a reference to that object will be stored in the variable.

When an object is created, whether on the stack or the heap, that object contains its member variables. If the object is created on the stack then the member variables exist on the stack and if the object is created on the heap then the member variables exist on the heap. Whether those member variables exist on the stack or the heap, they still behave exactly as value types and reference types always do, i.e. the value type variables contain the objects and the reference type variables contain references to objects created on the heap.

If you have a structure with a member variable that is an array then the structure will behave like value types always do, i.e. the object will be stored in the variable, wherever that variable happens to be. The array field will contain a reference to an array created on the heap. If the array is of a value type then the array will contain the element objects themselves while an array that is of a reference type will contain references to objects stored elsewhere on the heap.

It's pretty simple really:

  • Local variables are stored on the stack.
  • Member variables are stored within the object, wherever that is stored.
  • Value type objects are stored in the variable, wherever that is stored.
  • Reference type objects are stored on the heap and a reference to them is stored in the variable, wherever that is stored.


Related Topics



Leave a reply



Submit