Why Does Sizeof(Int) Vary Across Different Operating Systems

Why does sizeof(int) vary across different operating systems?

According to the C++ standard

1.7.1 states:

The fundamental storage unit in the C++ memory model is the byte. A
byte is at least large enough to contain any member of the basic
execution character set ...

then 3.9.1.1 states:

Objects declared as characters (char) shall be large enough to store
any member of the implementation’s basic character set.

So we can infer that char is actually a byte. Most importantly 3.9.1.2 also says:

There are five signed integer types: “signed char”, “short int”, “int”,
“long int”, and “long long int”. In this list, each type provides at
least as much storage as those preceding it in the list. Plain ints
have the natural size suggested by the architecture of the execution
environment; the other signed integer types are provided to meet
special needs.

So in other words the size of int is (a) guaranteed to be at least a byte and (b) naturally aligned to the OS/hardware it's running on so most likely these days to be 64 bit or (for many older systems) 32 bit.

Why do the sizes of data types change as the Operating System changes?

That was probably a trick question. The sizeof(char) is always 1.

If the size differs, it's probably because of a non-conforming compiler, in which case the question should be about the compiler itself, not about the C or C++ language.

5.3.3 Sizeof [expr.sizeof]

1 The sizeof operator yields the number of bytes in the object
representation of its operand
. The operand is either an expression,
which is not evaluated, or a parenthesized type-id. The sizeof
operator shall not be applied to an expression that has function or
incomplete type, or to an enumeration type before all its enumerators
have been declared, or to the parenthesized name of such types, or to
an lvalue that designates a bit-field. sizeof(char), sizeof(signed
char)
and sizeof(unsigned char) are 1. The result of sizeof applied to any other fundamental type (3.9.1) is
implementation-defined.
(emphasis mine)

The sizeof of other types than the ones pointed out are implementation-defined, and they vary for various reasons. An int has better range if it's represented in 64 bits instead of 32, but it's also more efficient as 32 bits on a 32-bit architecture.

Does the size of an int depend on the compiler and/or processor?

The answer to this question depends on how far from practical considerations we are willing to get.

Ultimately, in theory, everything in C and C++ depends on the compiler and only on the compiler. Hardware/OS is of no importance at all. The compiler is free to implement a hardware abstraction layer of any thickness and emulate absolutely anything. There's nothing to prevent a C or C++ implementation from implementing the int type of any size and with any representation, as long as it is large enough to meet the minimum requirements specified in the language standard. Practical examples of such level of abstraction are readily available, e.g. programming languages based on "virtual machine" platform, like Java.

However, C and C++ are intended to be highly efficient languages. In order to achieve maximum efficiency a C or C++ implementation has to take into account certain considerations derived from the underlying hardware. For that reason it makes a lot of sense to make sure that each basic type is based on some representation directly (or almost directly) supported by the hardware. In that sense, the size of basic types do depend on the hardware.

In other words, a specific C or C++ implementation for a 64-bit hardware/OS platform is absolutely free to implement int as a 71-bit 1's-complement signed integral type that occupies 128 bits of memory, using the other 57 bits as padding bits that are always required to store the birthdate of the compiler author's girlfriend. This implementation will even have certain practical value: it can be used to perform run-time tests of the portability of C/C++ programs. But that's where the practical usefulness of that implementation would end. Don't expect to see something like that in a "normal" C/C++ compiler.

Size of data types does vary depending upon OS or platform?

They can yes (and they often do). The C standard requires certain minimal ranges for integer and floating point types, but leaves a lot of freedom to the platform to decide the actual sizes of the types.

See eg. limits.h and float.h.

Furthermore, there is a strict relation between the sizes of integer types that needs to be true :

sizeof(char) == 1
sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)

Platforms will usually pick sizes that are natural to the hardware. For example, ints will usually have the native word size (32 bits on 32bit hardware eg.).

For floating point types, there is less variation. Most platforms are more or less compliant with IEEE 754.

Does the size of data types in C depend on the OS?

Yes. This is precisely what is meant by "platform-specific definitions" for things like the size of int and the meaning of system calls.

They depend not just on the OS, but also on the target hardware and compiler configuration.

What determines the size of integer in C?

Ultimately the compiler does, but in order for compiled code to play nicely with system libraries, most compilers match the behavior of the compiler[s] used to build the target system.

So loosely speaking, the size of int is a property of the target hardware and OS (two different OSs on the same hardware may have a different size of int, and the same OS running on two different machines may have a different size of int; there are reasonably common examples of both).

All of this is also constrained by the rules in the C standard. int must be large enough to represent all values between -32767 and 32767, for example.

Who decides the sizeof any datatype or structure (depending on 32 bit or 64 bit)?

It's ultimately the compiler. The compiler implementors can decide to emulate whatever integer size they see fit, regardless of what the CPU handles the most efficiently. That said, the C (and C++) standard is written such, that the compiler implementor is free to choose the fastest and most efficient way. For many compilers, the implementers chose to keep int as a 32 bit, although the CPU natively handles 64 bit ints very efficiently.

I think this was done in part to increase portability towards programs written when 32 bit machines were the most common and who expected an int to be 32 bits and no longer. (It could also be, as user user3386109 points out, that 32 bit data was preferred because it takes less space and therefore can be accessed faster.)

So if you want to make sure you get 64 bit ints, you use int64_t instead of int to declare your variable. If you know your value will fit inside of 32 bits or you don't care about size, you use int to let the compiler pick the most efficient representation.

As for the other datatypes such as struct, they are composed from the base types such as int.

Does the size of the integer or any other data types in C dependent on the underlying architecture?

see here:
size guarantee for integral/arithmetic types in C and C++

Fundamental C type sizes are depending on implementation (compiler) and architecture, however they have some guaranteed boundaries. One should therefore never hardcode type sizes and instead use sizeof(TYPENAME) to get their length in bytes.

What is the historical context for long and int often being the same size?

From the C99 rationale (PDF) on section 6.2.5:

[...] In the 1970s, 16-bit C (for the
PDP-11) first represented file
information with 16-bit integers,
which were rapidly obsoleted by disk
progress. People switched to a 32-bit
file system, first using int[2]
constructs which were not only
awkward, but also not efficiently
portable to 32-bit hardware.

To solve the problem, the long type
was added to the language, even though
this required C on the PDP-11 to
generate multiple operations to
simulate 32-bit arithmetic. Even as
32-bit minicomputers became available
alongside 16-bit systems, people still
used int for efficiency, reserving
long for cases where larger integers
were truly needed, since long was
noticeably less efficient on 16-bit
systems. Both short and long were
added to C, making short available
for 16 bits, long for 32 bits, and
int as convenient for performance.
There was no desire to lock the
numbers 16 or 32 into the language, as
there existed C compilers for at least
24- and 36-bit CPUs, but rather to
provide names that could be used for
32 bits as needed.

PDP-11 C might have been
re-implemented with int as 32-bits,
thus avoiding the need for long; but
that would have made people change
most uses of int to short or
suffer serious performance degradation
on PDP-11s. In addition to the
potential impact on source code, the
impact on existing object code and
data files would have been worse, even
in 1976. By the 1990s, with an immense
installed base of software, and with
widespread use of dynamic linked
libraries, the impact of changing the
size of a common data object in an
existing environment is so high that
few people would tolerate it, although
it might be acceptable when creating a
new environment. Hence, many vendors,
to avoid namespace conflicts, have
added a 64-bit integer to their 32-bit
C environments using a new name, of
which long long has been the most
widely used. [...]



Related Topics



Leave a reply



Submit