Why Reference Size Is Always 4 Bytes - C++

why reference size is always 4 bytes - c++

The standard is pretty clear on sizeof (C++11, 5.3.3/4):

When applied to a reference or a reference type, the result is the
size of the referenced type.

So if you really are taking sizeof(double&), the compiler is telling you that sizeof(double) is 4.

Update: So, what you really are doing is applying sizeof to a class type. In that case,

When applied to a class, the result is the number of bytes in an
object of that class [...]

So we know that the presence of the reference inside A causes it to take up 4 bytes. That's because even though the standard does not mandate how references are to be implemented, the compiler still has to implement them somehow. This somehow might be very different depending on the context, but for a reference member of a class type the only approach that makes sense is sneaking in a double* behind your back and calling it a double& in your face.

So if your architecture is 32-bit (in which pointers are 4 bytes long) that would explain the result.

Just keep in mind that the concept of a reference is not tied to any specific implementation. The standard allows the compiler to implement references however it wants.

Why does sizeof(MACRO) give an output of 4 bytes when MACRO does not hold any memory location?

sizeof(max) is replaced by the preprocessor with sizeof('A'). sizeof('A') is the same as sizeof(int), and the latter is 4 on your platform.

For the avoidance of doubt, 'A' is an int constant in C, not a char. (Note that in C++ 'A' is a char literal, and sizeof(char) is fixed at 1 by the standard.)

why the char buffer content size is 4 bytes?

Diagnosis

In the second example, buff is a char buffer and plain char is a signed type on your machine, and you're storing values which are negative in buff, so when they're converted to int in the call to printf(), they are negative integers (of small magnitude), printed in hex.

ISO/IEC 9899:2018

Actually, the links are to an online draft of C11, not C18, in HTML which allows links to the relevant paragraphs in the standard. AFAIK, these details have not changed between C90, C99, C11 and C18 anyway.

The standard says that the plain char type is equivalent to either signed char or unsigned char.

§6.2.5 Types ¶15:

The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.⁴⁵⁾
⁴⁵⁾ CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.

§6.3.1.1 Boolean, characters and integers ¶2,3:

2 The following may be used in an expression wherever an int or unsigned int may be used:
An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to the rank of int and unsigned int.
A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.⁵⁸⁾ All other types are unchanged by the integer promotions.
3 The integer promotions preserve value including sign. As discussed earlier, whether a "plain" char is treated as signed is implementation-defined.
⁵⁸⁾ The integer promotions are applied only: as part of the usual arithmetic conversions, to certain argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the shift operators, as specified by their respective subclasses.

§6.5.2.6 Function calls ¶6,7:

6 If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. If the number of arguments does not equal the number of parameters, the behavior is undefined. If the function is defined with a type that includes a prototype, and either the prototype ends with an ellipsis (, ...) or the types of the arguments after promotion are not compatible with the types of the parameters, the behavior is undefined. If the function is defined with a type that does not include a prototype, and the types of the arguments after promotion are not compatible with those of the parameters after promotion, the behavior is undefined, except for the following cases:
one promoted type is a signed integer type, the other promoted type is the corresponding unsigned integer type, and the value is representable in both types;
both types are pointers to qualified or unqualified versions of a character type or void.
7 If the expression that denotes the called function has a type that does include a prototype, the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type. The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments.

Exegesis

Note the last two sentences of §6.5.2.6 ¶7 — when the char values are promoted by the 'integer promotions', they are promoted to a (signed) int, and the negative values remain negative. Since an int has 4 bytes, and all the machines you're likely to have available use two's-complement arithmetic, the most significant 3 bytes of the value will be 0xFF each.

Prescription

To always print 2-digit hex for the characters, use %.2X (or %.2x if you prefer; you can also use either %02X or %02x) and pass either (unsigned char)rbuff[r_n-1] or rbuff[r_n-1] & 0xFF as the argument (using the variables from the first example). Or, using the variables from the second example:

printf("%.2X\n", (unsigned char)buff[i]);
printf("%.2X\n", buff[i] & 0xFF);

Why reference variable inside class always taking 4 bytes irrespect of type? (on 32-bit system)

Those for bytes are the reference. A reference is just a pointer internally, and pointers typically use 4 bytes on a 32bit system, irrespective of the data types because it is just an address, not the value itself.

How the size of this structure comes out to be 4 byte

Sample Image

Source: http://geeksforgeeks.org/?p=9705

In sum: it is optimizing the packing of bits (that's what bit-fields are meant for) as maximum as possible without compromising on alignment.

A variable’s data alignment deals with the way the data stored in these banks. For example, the natural alignment of int on 32-bit machine is 4 bytes. When a data type is naturally aligned, the CPU fetches it in minimum read cycles.

Similarly, the natural alignment of short int is 2 bytes. It means, a short int can be stored in bank 0 – bank 1 pair or bank 2 – bank 3 pair. A double requires 8 bytes, and occupies two rows in the memory banks. Any misalignment of double will force more than two read cycles to fetch double data.

Note that a double variable will be allocated on 8 byte boundary on 32 bit machine and requires two memory read cycles. On a 64 bit machine, based on number of banks, double variable will be allocated on 8 byte boundary and requires only one memory read cycle.

So the compiler will introduce alignment requirement to every structure. It will be as that of the largest member of the structure. If you remove char from your struct, you will still get 4 bytes.

In your struct, char is 1 byte aligned. It is followed by an int bit-field, which is 4 byte aligned for integers, but you defined a bit-field.

8 bits = 1 byte. Char can be any byte boundary. So Char + Int:8 = 2 bytes. Well, that's an odd byte boundary so the compiler adds an additional 2 bytes to maintain the 4-byte boundary.

For it to be 8 bytes, you would have to declare an actual int (4 bytes) and a char (1 byte). That's 5 bytes. Well that's another odd byte boundary, so the struct is padded to 8 bytes.

What I have commonly done in the past to control the padding is to place fillers in between my struct to always maintain the 4 byte boundary. So if I have a struct like this:

struct s {
    int id;
    char b;
};

I am going to insert allocation as follows:

struct d {
    int id;
    char b;
    char temp[3];
}

That would give me a struct with a size of 4 bytes + 1 byte + 3 bytes = 8 bytes! This way I can ensure that my struct is padded the way I want it, especially if I transmit it somewhere over the network. Also, if I ever change my implementation (such as if I were to maybe save this struct into a binary file, the fillers were there from the beginning and so as long as I maintain my initial structure, all is well!)

Finally, you can read this post on C Structure size with bit-fields for more explanation.

What is array to pointer decay?

It's said that arrays "decay" into pointers. A C++ array declared as int numbers [5] cannot be re-pointed, i.e. you can't say numbers = 0x5a5aff23. More importantly the term decay signifies loss of type and dimension; numbers decay into int* by losing the dimension information (count 5) and the type is not int [5] any more. Look here for cases where the decay doesn't happen.

If you're passing an array by value, what you're really doing is copying a pointer - a pointer to the array's first element is copied to the parameter (whose type should also be a pointer the array element's type). This works due to array's decaying nature; once decayed, sizeof no longer gives the complete array's size, because it essentially becomes a pointer. This is why it's preferred (among other reasons) to pass by reference or pointer.

Three ways to pass in an array¹:

void by_value(const T* array)   // const T array[] means the same
void by_pointer(const T (*array)[U])
void by_reference(const T (&array)[U])

The last two will give proper sizeof info, while the first one won't since the array argument has decayed to be assigned to the parameter.

^{1 The constant U should be known at compile-time.}

Why isn't the size of my array 4 bytes in C

Ok, I'll take this one question at a time...

Why isn't the result of sizeof(foo) 4?

It's because you've only set the size of the foo array to 3 in the first statement char foo[3]. There would be no reason to expect 4 as a result when you've explicitly defined the bound as 3 chars which is 3 bytes.

Why isn't the result 1?

You're correct in saying that in some cases, calling foo is the same as &foo[0]. The most relevant of these cases to your question is when being passed as a pointer into a function as a parameter. In the case of the sizeof function, when you pass in your foo array pointer, the function iterators throughout the memory block associated with that argued pointer and returns the total size, therefore not being 1.

Why Reference Size Is Always 4 Bytes - C++