Why Sizeof Built in Types Except Char Is Compiler Dependent in C & C++

Why sizeof built in types except char is compiler dependent in C & C++?

The reason is largely because C is portable to a much wider variety of platforms. There are many reasons why the different data types have turned out to be the various sizes they are on various platforms, but at least historically, int has been adapted to be the platform's native word size. On the PDP-11 it was 16 bits (and long was originally invented for 32-bit numbers), while some embedded platform compilers even have 8-bit ints. When 32-bit platforms came around and started having 32-bit ints, short was invented to represent 16-bit numbers.

Nowadays, most 64-bit architectures use 32-bit ints simply to be compatible with the large base of C programs that were originally written for 32-bit platforms, but there have been 64-bit C compilers with 64-bit ints as well, not least of which some early Cray platforms.

Also, in the earlier days of computing, floating-point formats and sizes were generally far less standardized (IEEE 754 didn't come around until 1985), which is why floats and doubles are even less well-defined than the integer data types. They generally don't even presume the presence of such peculiarities as infinities, NaNs or signed zeroes.

Furthermore, it should perhaps be said that a char is not defined to be 1 byte, but rather to be whatever sizeof returns 1 for. Which is not necessarily 8 bits. (For completeness, it should perhaps be added here, also, that "byte" as a term is not universally defined to be 8 bits; there have been many historical definitions of it, and in the context of the ANSI C standard, a "byte" is actually defined to be the smallest unit of storage that can store a char, whatever the nature of char.)

There are also such architectures as the 36-bit PDP-10s and 18-bit PDP-7s that have also run C programs. They may be quite rare these days, but do help explain why C data types are not defined in terms of 8-bit units.

Whether this, in the end, really makes the language "more portable" than languages like Java can perhaps be debated, but it would sure be suboptimal to run Java programs on 16-bit processors, and quite weird indeed on 36-bit processors. It is perhaps fair to say that it makes the language more portable, but programs written in it less portable.

EDIT: In reply to some of the comments, I just want to append, as an opinion piece, that C as a language is unlike languages like Java/Haskell/ADA that are more-or-less "owned" by a corporation or standards body. There is ANSI C, sure, but C is more than ANSI C; it's a living community, and there are many implementations that aren't ANSI-compatible but are "C" nevertheless. Arguing whether implementations that use 8-bit ints are C is similar to arguing whether Scots is English in that it's mostly pointless. They use 8-bit ints for good reasons, noone who knows C well enough would be unable to reason about programs written for such compilers, and anyone who writes C programs for such architectures would want their ints to be 8 bits.

Is c sizeof() function dependent on the host computer?

yes for sure the size of the structure can change depending on the operating system
that's one of the main reasons to use sizeof() to make sure you get the same functionality on different operating systems.

Why are the values returned by sizeof() compiler dependent?

Compilers might perform data structure alignment for a target architecture if needed. It might done purely to improve runtime performance of the application, or in some cases is required by the processor (i.e. the program will not work if data is not aligned).

For example, most (but not all) SSE2 instructions require data to aligned on 16-byte boundary. To put it simply, everything in computer memory has an address. Let's say we have a simple array of doubles, like this:

double data[256];

In order to use SSE2 instructions that require 16-byte alignment, one must make sure that address of &data[0] is multiple of 16.

The alignment requirements differ from one architecture to another. On x86_64, it is recommended that all structures larger than 16 bytes align on 16-byte boundaries. In general, for the best performance, align data as follows:

  • Align 8-bit data at any address
  • Align 16-bit data to be contained within an aligned four-byte word
  • Align 32-bit data so that its base address is a multiple of four
  • Align 64-bit data so that its base address is a multiple of eight
  • Align 80-bit data so that its base address is a multiple of sixteen
  • Align 128-bit data so that its base address is a multiple of sixteen

Interestingly enough, most x86_64 CPUs would work with both aligned and non-aligned data. However, if the data is not aligned properly, CPU executes code significantly slower.

When compiler takes this into consideration, it may align members of the structure implicitly and that would affect its size. For example, let's say we have a structure like this:

struct A {
char a;
int b;
};

Assuming x86_64, the size of int is 32-bit or 4 bytes. Therefore, it is recommended to always make address of b a multiple of 4. But because a field size is only 1 byte, this won't be possible. Therefore, compiler would add 3 bytes of padding in between a and b implicitly:

struct A {
char a;
char __pad0[3]; /* This would be added by compiler,
without any field names - __pad0 is for
demonstration purposes */
int b;
};

How compiler does it depends not only on compiler and architecture, but on compiler settings (flags) you pass to the compiler. This behavior can also be affected using special language constructs. For example, one can ask the compiler to not perform any padding with packed attribute like this:

struct A {
char a;
int b;
} __attribute__((packed));

In your case, mingw32-gcc.exe has simply added 7 bytes between c and d to align d on 8 byte boundary. Whereas gcc 4.6.3 on Ubuntu has added only 3 to align d on 4 byte boundary.

Unless you are performing some optimizations, trying to use special extended instruction set, or have specific requirements for your data structures, I'd recommend you do not depend on specific compiler behavior and always assume that not only your structure might get padded, it might get padded differently between architectures, compilers and/or different compiler versions. Otherwise you'd need to semi-manually ensure data alignment and structure sizes using compiler attributes and settings, and make sure it all works across all compilers and platforms you are targeting using unit tests or maybe even static assertions.

For more information, please check out:

  • Data Alignment article on Wikipedia
  • Data Alignment when Migrating to 64-Bit Intel® Architecture
  • GCC Variable Attributes

Hope it helps. Good Luck!

How to minimize padding:

It is always good to have all your struct members properly aligned and at the same time keep your structure size reasonable. Consider these 2 struct variants with members rearanged (from now on assume sizeof char, short, int, long, long long to be 1, 2, 4, 4, 8 respectively):

struct A
{
char a;
short b;
char c;
int d;
};

struct B
{
char a;
char c;
short b;
int d;
};

Both structures are supposed to keep the same data but while sizeof(struct A) will be 12 bytes, sizeof(struct B) will be 8 due to well-though-out member order which eliminated implicit padding:

struct A
{
char a;
char __pad0[1]; // implicit compiler padding
short b;
char c;
char __pad1[3]; // implicit compiler padding
int d;
};

struct B // no implicit padding
{
char a;
char c;
short b;
int d;
};

Rearranging struct members may be error prone with increase of member count. To make it less error prone - put longest at the beginning and shortest at the end:

struct B // no implicit padding
{
int d;
short b;
char a;
char c;
};

Implicit padding at the end of stuct:

Depending on your compiler, settings, platform etc used you may notice that compiler adds padding not only before struct members but also at the end (ie. after the last member). Below structure:

struct abcd
{
long long a;
char b;
};

may occupy 12 or 16 bytes (worst compilers will allow it to be 9 bytes). This padding may be easily overlooked but is very important if your structure will be array alement. It will ensure your a member in subsequent array cells/elements will be properly aligned too.

Final and random thoughts:

It will never hurt (and may actually save) you if - when working with structs - you follow these advices:

  • Do not rely on compiler to interleave your struct members with proper padding.
  • Make sure your struct (if outside array) is aligned to boundary required by its longest member.
  • Make sure you arrange your struct members so that longest are placed first and last member is shortest.
  • Make sure you explicitly padd your struct (if needed) so that if you create array of structs, every structure member has proper alignment.
  • Make sure that arrays of your structs are properly aligned too as although your struct may require 8 byte alignment, your compiler may align your array at 4 byte boundary.

Does the size of an int depend on the compiler and/or processor?

The answer to this question depends on how far from practical considerations we are willing to get.

Ultimately, in theory, everything in C and C++ depends on the compiler and only on the compiler. Hardware/OS is of no importance at all. The compiler is free to implement a hardware abstraction layer of any thickness and emulate absolutely anything. There's nothing to prevent a C or C++ implementation from implementing the int type of any size and with any representation, as long as it is large enough to meet the minimum requirements specified in the language standard. Practical examples of such level of abstraction are readily available, e.g. programming languages based on "virtual machine" platform, like Java.

However, C and C++ are intended to be highly efficient languages. In order to achieve maximum efficiency a C or C++ implementation has to take into account certain considerations derived from the underlying hardware. For that reason it makes a lot of sense to make sure that each basic type is based on some representation directly (or almost directly) supported by the hardware. In that sense, the size of basic types do depend on the hardware.

In other words, a specific C or C++ implementation for a 64-bit hardware/OS platform is absolutely free to implement int as a 71-bit 1's-complement signed integral type that occupies 128 bits of memory, using the other 57 bits as padding bits that are always required to store the birthdate of the compiler author's girlfriend. This implementation will even have certain practical value: it can be used to perform run-time tests of the portability of C/C++ programs. But that's where the practical usefulness of that implementation would end. Don't expect to see something like that in a "normal" C/C++ compiler.

C variable type sizes are machine dependent. Is it really true? signed & unsigned numbers ;

Machine dependent is not quite exact. Actually, it's implementation-defined. It may depend on compiler, machine, compiler options etc.

For example, using Visual C++, long would be 32 bit even on 64 bit machines.

In the C++11 standard, why leave the char type implementation dependent?

Some processors prefer signed char, and others prefer unsigned char. For example, POWER can load an 8-bit value from memory with zero extension, but not sign extension. But SuperH-3 can load an 8-bit value from memory with sign extension but not zero extension. C++ derives from C, and C leaves many details of the language implementation-defined so that each implementation can be tailored to be most efficient for its target environment.

Does the working of sizeof operator different in c andd c++

It's nothing to do with the sizeof operator in particular; rather, the size of character literals in C is different than C++, because character literals in C are of type int, whereas in C++ they are of type char.

See Why are C character literals ints instead of chars? for more information.

integer size in c depends on what?

It's implementation-dependent. The C standard only requires that:

  • char has at least 8 bits
  • short has at least 16 bits
  • int has at least 16 bits
  • long has at least 32 bits
  • long long has at least 64 bits (added in 1999)
  • sizeof(char) ≤ sizeof(short) ≤ sizeof(int) ≤ sizeof(long) ≤ sizeof(long long)

In the 16/32-bit days, the de facto standard was:

  • int was the "native" integer size
  • the other types were the minimum size allowed

However, 64-bit systems generally did not make int 64 bits, which would have created the awkward situation of having three 64-bit types and no 32-bit type. Some compilers expanded long to 64 bits.

What determines the size of integer in C?

Ultimately the compiler does, but in order for compiled code to play nicely with system libraries, most compilers match the behavior of the compiler[s] used to build the target system.

So loosely speaking, the size of int is a property of the target hardware and OS (two different OSs on the same hardware may have a different size of int, and the same OS running on two different machines may have a different size of int; there are reasonably common examples of both).

All of this is also constrained by the rules in the C standard. int must be large enough to represent all values between -32767 and 32767, for example.



Related Topics



Leave a reply



Submit