Performance of Built-In Types: Char VS Short VS Int Vs. Float Vs. Double

Performance of built-in types : char vs short vs int vs. float vs. double

Float vs. integer:

Historically, floating-point could be much slower than integer arithmetic. On modern computers, this is no longer really the case (it is somewhat slower on some platforms, but unless you write perfect code and optimize for every cycle, the difference will be swamped by the other inefficiencies in your code).

On somewhat limited processors, like those in high-end cell phones, floating-point may be somewhat slower than integer, but it's generally within an order of magnitude (or better), so long as there is hardware floating-point available. It's worth noting that this gap is closing pretty rapidly as cell phones are called on to run more and more general computing workloads.

On very limited processors (cheap cell phones and your toaster), there is generally no floating-point hardware, so floating-point operations need to be emulated in software. This is slow -- a couple orders of magnitude slower than integer arithmetic.

As I said though, people are expecting their phones and other devices to behave more and more like "real computers", and hardware designers are rapidly beefing up FPUs to meet that demand. Unless you're chasing every last cycle, or you're writing code for very limited CPUs that have little or no floating-point support, the performance distinction doesn't matter to you.

Different size integer types:

Typically, CPUs are fastest at operating on integers of their native word size (with some caveats about 64-bit systems). 32 bit operations are often faster than 8- or 16- bit operations on modern CPUs, but this varies quite a bit between architectures. Also, remember that you can't consider the speed of a CPU in isolation; it's part of a complex system. Even if operating on 16-bit numbers is 2x slower than operating on 32-bit numbers, you can fit twice as much data into the cache hierarchy when you represent it with 16-bit numbers instead of 32-bits. If that makes the difference between having all your data come from cache instead of taking frequent cache misses, then the faster memory access will trump the slower operation of the CPU.

Other notes:

Vectorization tips the balance further in favor of narrower types (float and 8- and 16-bit integers) -- you can do more operations in a vector of the same width. However, good vector code is hard to write, so it's not as though you get this benefit without a lot of careful work.

Why are there performance differences?

There are really only two factors that effect whether or not an operation is fast on a CPU: the circuit complexity of the operation, and user demand for the operation to be fast.

(Within reason) any operation can be made fast, if the chip designers are willing to throw enough transistors at the problem. But transistors cost money (or rather, using lots of transistors makes your chip larger, which means you get fewer chips per wafer and lower yields, which costs money), so chip designers have to balance how much complexity to use for which operations, and they do this based on (perceived) user demand. Roughly, you might think of breaking operations into four categories:

                 high demand            low demand
high complexity  FP add, multiply       division
low complexity   integer add            popcount, hcf
                 boolean ops, shifts

high-demand, low-complexity operations will be fast on nearly any CPU: they're the low-hanging fruit, and confer maximum user benefit per transistor.

high-demand, high-complexity operations will be fast on expensive CPUs (like those used in computers), because users are willing to pay for them. You're probably not willing to pay an extra $3 for your toaster to have a fast FP multiply, however, so cheap CPUs will skimp on these instructions.

low-demand, high-complexity operations will generally be slow on nearly all processors; there just isn't enough benefit to justify the cost.

low-demand, low-complexity operations will be fast if someone bothers to think about them, and non-existent otherwise.

Further reading:

Agner Fog maintains a nice website with lots of discussion of low-level performance details (and has very scientific data collection methodology to back it up).
The Intel® 64 and IA-32 Architectures Optimization Reference Manual (PDF download link is part way down the page) covers a lot of these issues as well, though it is focused on one specific family of architectures.

int or char? which is faster?

Given the constraints, you should probably be using uint_fast8_t which gives you what is generally the fastest unsigned type that is capable of storing at least uint8_t values (where uint8_t is usually unsigned char, of course). The type is defined in <stdint.h> in C99 and later (and uint_fast8_t is required to be defined, but it is not necessarily the same as uint8_t, and uint8_t need not be defined if the CPU does not support 8-bit bytes).

If you go down this route, you'll probably need to brush up on the correct format specifiers for the printf() and scanf() families of functions. These are defined in <inttypes.h>. Using anything else is fraught with portability issues (at least in theory).

Using short and long instead of int and double (C++)

The only advantage of using short is that it takes up less space. If you're programming for a very memory-tight environment, or you have a large array of numbers, or your data will leave the program's address space (e.g., to be saved to disk or transmitted across a network) then this might be important. In other cases, using types other than int unnecessarily might actually slow down your program due to the way processor architectures are designed. See for example:

Pros/cons to using char for small integers in C
Performance of built-in types : char vs short vs int vs. float vs. double

If you need a type that has exactly the size you think short has, in order to store some bit pattern or something like that, you should be using one of the exact-width types such as std::int16_t, not short. So that's usually not a valid reason for using short.

It's possible that your C++ instructor learned to code a long time ago in an environment where every byte counts, and mistakenly believes that that's still the case nowadays. Sadly, this kind of preconceived notion is very common. (Other symptoms include forbidding exceptions and forbidding the use of standard library containers). In such cases you should generally be aware of the fact that smart people can often say stupid things. Stack Overflow is a good place to get information about current best practices, as are the books listed here: The Definitive C++ Book Guide and List

Are char and small int slower than int?

On any modern, practical machine, char, int, and long will all be fast (probably equally fast). Whether short is fast or not varies somewhat between cpu architecture and even different cpu models within a single architecture.

With that said, there's really no good reason to use small types for single variables, regardless of their speed. Their semantics are confusing (due to default promotions to int) and they will not save you significant space (maybe not even any space). The only time I would ever use char, short, int8_t, int16_t, etc. is in arrays or structs that have to match a fixed binary layout of where you'll have so many of them (e.g. pixels or audio samples) that the size of each one actually matters.

Explanation for Why int should be preferred over long/short in c++?

int is traditionally the most "natural" integral type for the machine on which the program is to run. What is meant by "most natural" is not too clear, but I would expect that it would not be slower than other types. More to the point, perhaps, is that there is an almost universal tradition for using int in preference to other types when there is no strong reason for doing otherwise. Using other integral types will cause an experienced C++ programmer, on reading the code, to ask why.

Why can using unsigned short be slower than using int?

I think the wording "probably slower" is too hard.

A theoretical fact is:

Calculations are done with at least int size, e.g.

short a = 5;
short b = 10;
short c = a + b;

This code snippet contains 3 implicit conversions. a and b are converted to int and added. The result is converted back from int to short. You can't avoid it. Arithmetic in C++ uses at least int size. It's called integer promotion.

In practice most conversions will be avoided by the compiler. And the optimizer will also remove many conversions.

Float vs Double performance in C# considering cache

What if the locality issues are considered appropriately?

Still the same because those are normally not as high as you think. If you deal with float and double except just copying it, there is some significant time spent actually CALCULATING. Your XOR example is a good example where you think wrong. XOR is a SIMPLE EASY FAST operation, so coherency is important. With floats you spend a lot more time doing the maths in most cases.

Why use the 'int' data type instead of the 'float' data type?

It is considered to be slower than int in arithmetic computations.

You may not use floats in comparisons because of their precession. (read more here)

This is a good post as well :

Float vs. integer:

Historically, floating-point could be much slower than integer
arithmetic. On modern computers, this is no longer really the case (it
is somewhat slower on some platforms, but unless you write perfect
code and optimize for every cycle, the difference will be swamped by
the other inefficiencies in your code).

On somewhat limited processors, like those in high-end cell phones,
floating-point may be somewhat slower than integer, but it's generally
within an order of magnitude (or better), so long as there is hardware
floating-point available. It's worth noting that this gap is closing
pretty rapidly as cell phones are called on to run more and more
general computing workloads.

On very limited processors (cheap cell phones and your toaster), there
is generally no floating-point hardware, so floating-point operations
need to be emulated in software. This is slow -- a couple orders of
magnitude slower than integer arithmetic.

As I said though, people are expecting their phones and other devices
to behave more and more like "real computers", and hardware designers
are rapidly beefing up FPUs to meet that demand. Unless you're chasing
every last cycle, or you're writing code for very limited CPUs that
have little or no floating-point support, the performance distinction
doesn't matter to you.

I would suggest you to read the whole post to get a clear view of their distinctions.

Objective C float vs int, CGPoint vs custom int based struct performance

It depends on what you are doing with it. For ordinary arithmetic, throughput can be similar. Integer latency is usually a bit less. On some processors, the latency to L1 is better for GPRs than FPR. So, for many tests, the results will come out the same or give a small edge for integer computation. The balance will flip the other way for double vs int64_t computation on 32-bit machines. (If you are writing CPU vector code and can get away with 16-bit computation then it would be much faster to use integer.)

However, in the case of calculating coordinates/addresses for purposes of loading or storing data into/from a register, integer is clearly better on a CPU. The reason is that the load or store instruction can take an integer operand as an index into an array, but not a floating point one. To use floating point coordinates, you at minimum have to convert to integer first, then load or store, so it should be always slower. Typically, there will also have to be some rounding mode set as well (e.g. a floor() operation) and maybe some non-trivial operation to account for edging modes, such as a CL_ADDRESS_REPEAT addressing mode. Contrast that to a simple AND operation, which may be all that is necessary to achieve the same thing on integer and it should be clear that integer is a much cheaper format.

On GPUs, which emphasize floating-point computation a bit more and may not invest much in integer computation (even though it is easier), the story is quite different. There you can expect texture unit hardware to use the floating point value directly to find the required data. The floating point arithmetic to find the right coordinate is built in to the hardware and therefore "free" (if we ignore energy consumption considerations) and graphics APIs like GL or CL are built around it.

Generally speaking, though ubiquitous in graphics APIs, floating-point itself is a numerically inferior choice for a coordinate system for sampled data. It lumps too much precision in one corner of the image and may cause quantization errors / inconsistencies at the far corners of the image, leading to reduced precision for linear sampling and unexpected rounding effects. For large enough images, some pixels in the image may become unaddressable by the coordinate system, because no floating-point number exists which references that position. It is probably the case that the default rounding mode, round to nearest ties to even is undesirable for coordinate systems because linear filtering will often place the coordinate half way between two integer values, resulting in a round up for even pixels and round down for odd. This causes pixel duplication rather than the expected result in the worst case where they are all hell ways cases and the stride is 1. It is nice in that it is somewhat easier to use.

A fixed-point coordinate system allows for consistent coordinate precision and rounding across the entire surface and will avoid these problems. Modulo overflow feeds nicely into some common edging modes. Precision is predictable.

C++ int vs long long in 64 bit machine

1) If it is best practice to use long long in x64 for achieving maximum performance even for for 1-4 byte data?

No- and it will probably in fact make your performance worse. For example, if you use 64-bit integers where you could have gotten away with 32-bit integers then you have just doubled the amount of data that must be sent between the processor and memory and the memory is orders of magnitude slower. All of your caches and memory buses will crap out twice as fast.

2) Trade-off in using a type less than word size(memory win vs additional operations)

Generally, the dominant driver of performance in a modern machine is going to be how much data needs to be stored in order to run a program. You are going to see significant performance cliffs once the working set size of your program exceeds the capacity of your registers, L1 cache, L2 cache, L3 cache, and RAM, in that order.

In addition, using a smaller data type can be a win if your compiler is smart enough to figure out how to use your processor's vector instructions (aka SSE instructions). Modern vector processing units are smart enough to cram eight 16-bit short integers into the same space as two 64-bit long long integers, so you can do four times as many operations at once.

3) Does a x64 computer where word&int size is 64 bits, has possibility of processing a short, using 16 bit word size by using so called backward compatibility? Or it must put the 16bit file into 64 bit file, and the fact that it can be done defines the system as backward compatible.

I'm not sure what you're asking here. In general, 64-bit machines are capable of executing 32-bit and 16-bit executable files because those earlier executable files use a subset of the 64-bit machine's potential.

Hardware instruction sets are generally backwards compatible, meaning that processor designers tend to add capabilities, but rarely if ever remove capabilities.

4) Can we force the compiler to make the int 64 bit?

There are fairly standard extensions for all compilers that allow you to work with fixed-bit-size data. For example, the header file stdint.h declares types such as int64_t, uint64_t, etc.

5) How to incorporate ILP64 into PC that uses LP64?

https://software.intel.com/en-us/node/528682

6) What are possible problems of using code adapted to above issues with other compilers, OS's, and architectures(32 bit processor)?

Generally the compilers and systems are smart enough to figure out how to execute your code on any given system. However, 32-bit processors are going to have to do extra work to operate on 64-bit data. In other words, correctness should not be an issue, but performance will be.

But it's generally the case that if performance is really critical to you, then you need to program for a specific architecture and platform anyway.

Clarification Request: Thanks alot! I wanted to clarify question no:1. You say that it is bad for memory. Lets take an example of 32 bit int. When you send it to memory, because it is 64 bit system, for a desired integer 0xee ee ee ee, when we send it won't it become 0x ee ee ee ee+ 32 other bits? How can a processor send 32 bits when the word size is 64 bits? 32 bits are the desired values, but won't it be combined with 32 unused bits and sent this way? If my assumption is true, then there is no difference for memory.

There are two things to discuss here.

First, the situation you discuss does not occur. A processor does not need to "promote" a 32-bit value into a 64-bit value in order to use it appropriately. This is because modern processors have different accessing modes that are capable of dealing with different size data appropriately.

For example, a 64-bit Intel processor has a 64-bit register named RAX. However, this same register can be used in 32-bit mode by referring to it as EAX, and even in 16-bit and 8-bit modes. I stole a diagram from here:

x86_64 registers rax/eax/ax/al overwriting full register contents

1122334455667788
================ rax (64 bits)
        ======== eax (32 bits)
            ====  ax (16 bits)
            ==    ah (8 bits)
              ==  al (8 bits)

Between the compiler and assembler, the correct code is generated so that a 32-bit value is handled appropriately.

Second, when we're talking about memory overhead and performance we should be more specific. Modern memory systems are composed of a disk, then main memory (RAM) and typically two or three caches (e.g. L3, L2, and L1). The smallest quantity of data that can be addressed on the disk is called a page, and page sizes are usually 4096 bytes (though they don't have to be). Then, the smallest quantity of data that can be addressed in memory is called a cache line, which is usually much larger than 32 or 64 bits. On my computer the cache line size is 64 bytes. The processor is the only place where data is actually transferred and addressed at the word level and below.

So if you want to change one 64-bit word in a file that resides on disk, then, on my computer, this actually requires that you load 4096 bytes from the disk into memory, and then 64 bytes from memory into the L3, L2, and L1 caches, and then the processor takes a single 64-bit word from the L1 cache.

The result is that the word size means nothing for memory bandwidth. However, you can fit 16 of those 32-bit integers in the same space you can pack 8 of those 64-bit integers. Or you could even fit 32 16-bit values or 64 8-bit values in the same space. If your program uses a lot of different data values you can significantly improve performance by using the smallest data type necessary.

Performance of Built-In Types: Char VS Short VS Int Vs. Float Vs. Double