Size of Primitive Data Types

What determines the size of primitive data types?

What determines the size of a primitive data type?

It depends on Compiler. Compiler in turns usually depends on the architecture, processor, development environment etc because it takes them into account. So you may say it's a combination of all.

What is the reason to choose an integer to have a size of 2 bytes in some systems, and 4 bytes in others? Is there any reason it cannot proceed with 2 byte anymore?

The C++ standard does not specify the size of integral types in bytes, but it specifies minimum ranges they must be able to hold. You can infer minimum size in bits from the required range. You can infer minimum size in bytes from that and the value of the CHAR_BIT macro that defines the number of bits in a byte (in all but the most obscure platforms it's 8, and it can't be less than 8).

Check out here for more info.

Why do Primitive Data Types have a Fixed Size?

As low-level programming languages, the designs of C and C++ closely follow what common hardware is capable of. The primitive building blocks (fundamental types) correspond to entities that common CPUs natively support. CPUs typically can handle bytes and words very efficiently; C called these char and int. (More precisely, C defined int in such a way that a compiler could use the target CPU's word size for it.) There has also been CPU support for double-sized words, which historically corresponded to the long data type in C, later to the long long types of C and C++. Half-words corresponded to short. The basic integer types correspond to things a CPU can handle well, with enough flexibility to accommodate different architectures. (For example, if a CPU did not support half-words, short could be the same size as int.)

If there was hardware support for integers of unbounded size (limited only by available memory), then there could be an argument for adding that as a fundamental type in C (and C++). Until that happens, support of big integers (see bigint) in C and C++ has been relegated to libraries.

Some of the newer, higher-level languages do have built-in support for arbitrary-precision arithmetic.

getting size of primitive data types in python

Running

sys.getsizeof(float)

does not return the size of any individual float, it returns the size of the float class. That class contains a lot more data than just any single float, so the returned size will also be much bigger.

If you just want to know the size of a single float, the easiest way is to simply instantiate some arbitrary float. For example:

sys.getsizeof(float())

Note that

float()

simply returns 0.0, so this is actually equivalent to:

sys.getsizeof(0.0)

This returns 24 bytes in your case (and probably for most other people as well). In the case of CPython (the most common Python implementation), every float object will contain a reference counter and a pointer to the type (a pointer to the float class), which will each be 8 bytes for 64bit CPython or 4 bytes each for 32bit CPython. The remaining bytes (24 - 8 - 8 = 8 in your case which is very likely to be 64bit CPython) will be the bytes used for the actual float value itself.

This is not guaranteed to work out the same way for other Python implementations though. The language reference says:

These represent machine-level double precision floating point numbers. You are at the mercy of the underlying machine architecture (and C or Java implementation) for the accepted range and handling of overflow. Python does not support single-precision floating point numbers; the savings in processor and memory usage that are usually the reason for using these are dwarfed by the overhead of using objects in Python, so there is no reason to complicate the language with two kinds of floating point numbers.

and I'm not aware of any runtime methods to accurately tell you the number of bytes used. However, note that the quote above from the language reference does say that Python only supports double precision floats, so in most cases (depending on how critical it is for you to always be 100% right) it should be comparable to double precision in C.

Java primitive data types size

There's no need for such a thing in Java since the sizes of the primitives are set for all JVMs (unlike C where they can vary).

char is 16 bit (actually an unsigned quantity), int is 32 bit, long is 64 bit.

boolean is the ugly sister in all this. Internally it is manipulated as a 32 bit int, but arrays of booleans use 1 byte per element.

Primitive data types and portability in Java

In "lower" languages, primitive data types sizes are often derived from the CPU's ability to handle them.
E.g., in c, an int is defined as being "at least 16 bits in size", but its size may vary between architectures in order to assure that "The type int should be the integer type that the target processor is most efficient working with." (source). This means that if your code makes careless assumptions about an int's size, it may very well break if you port it from 32-bit x86 to 64-bit powerpc.

java, as noted above, is different. An int, e.g., will always be 32 bits. This means you don't have to worry about its size changing when you run the same code on a different architecture. The tradeoff, as also mentioned above, is performance - on any architecture that doesn't natively handle 32 bit calculations, these ints need to be expanded to the native size the CPU can handle (which will have a small penalty), or worse, if the CPU can only handle smaller ints, every operation on an int may require several CPU operations.

Does Java define the size of its primitive types anywhere?

Not a class, but you have Integer.SIZE, and so on for Long and floating point classes too. You also have *.BYTES.

Therefore Integer.SIZE is 32, Integer.BYTES is 4, Double.SIZE is 64 and Double.BYTES is 8, etc etc; all of these are ints in case you were wondering.

NOTE: *.BYTES are only defined since Java 8 (thanks @Slanec for noticing)

(*.SIZE appeared in Java 5 but you do use at least that, right?)

And yes, this is defined by the JDK since the JLS itself defines the size of primitive types; you are therefore guaranteed that you'll have the same values for these constants on whatever Java implementation on whatever platform.