String Compare C# - Whole Word Match

Should I use byte or int?

Usually yes, a 32 bit integer will perform slightly better because it is already properly aligned for native CPU instructions. You should only use a smaller sized numeric type when you actually need to store something of that size.

Why should I use int instead of a byte or short in C#

Performance-wise, an int is faster in almost all cases. The CPU is designed to work efficiently with 32-bit values.

Shorter values are complicated to deal with. To read a single byte, say, the CPU has to read the 32-bit block that contains it, and then mask out the upper 24 bits.

To write a byte, it has to read the destination 32-bit block, overwrite the lower 8 bits with the desired byte value, and write the entire 32-bit block back again.

Space-wise, of course, you save a few bytes by using smaller datatypes. So if you're building a table with a few million rows, then shorter datatypes may be worth considering. (And the same might be good reason why you should use smaller datatypes in your database)

And correctness-wise, an int doesn't overflow easily. What if you think your value is going to fit within a byte, and then at some point in the future some harmless-looking change to the code means larger values get stored into it?

Those are some of the reasons why int should be your default datatype for all integral data. Only use byte if you actually want to store machine bytes. Only use shorts if you're dealing with a file format or protocol or similar that actually specifies 16-bit integer values. If you're just dealing with integers in general, make them ints.

In java, is it more efficient to use byte or short instead of int and float instead of double?

Am I wrong in assuming it should be faster and more efficient? I'd hate to go through and change everything in a massive program to find out I wasted my time.

Short answer

Yes, you are wrong. In most cases, it makes little difference in terms of space used.

It is not worth trying to optimize this ... unless you have clear evidence that optimization is needed. And if you do need to optimize memory usage of object fields in particular, you will probably need to take other (more effective) measures.

Longer answer

The Java Virtual Machine models stacks and object fields using offsets that are (in effect) multiples of a 32 bit primitive cell size. So when you declare a local variable or object field as (say) a byte, the variable / field will be stored in a 32 bit cell, just like an int.

There are two exceptions to this:

  • long and double values require 2 primitive 32-bit cells
  • arrays of primitive types are represent in packed form, so that (for example) an array of bytes hold 4 bytes per 32bit word.

So it might be worth optimizing use of long and double ... and large arrays of primitives. But in general no.

In theory, a JIT might be able to optimize this, but in practice I've never heard of a JIT that does. One impediment is that the JIT typically cannot run until after there instances of the class being compiled have been created. If the JIT optimized the memory layout, you could have two (or more) "flavors" of object of the same class ... and that would present huge difficulties.



Revisitation

Looking at the benchmark results in @meriton's answer, it appears that using short and byte instead of int incurs a performance penalty for multiplication. Indeed, if you consider the operations in isolation, the penalty is significant. (You shouldn't consider them in isolation ... but that's another topic.)

I think the explanation is that JIT is probably doing the multiplications using 32bit multiply instructions in each case. But in the byte and short case, it executes extra instructions to convert the intermediate 32 bit value to a byte or short in each loop iteration. (In theory, that conversion could be done once at the end of the loop ... but I doubt that the optimizer would be able to figure that out.)

Anyway, this does point to another problem with switching to short and byte as an optimization. It could make performance worse ... in an algorithm that is arithmetic and compute intensive.



Secondary questions

I know java doesn't have unsigned types but is there anything extra I could do if I knew the number would be positive only?

No. Not in terms of performance anyway. (There are some methods in Integer, Long, etc for dealing with int, long, etc as unsigned. But these don't give any performance advantage. That is not their purpose.)

(I'd assume the garbage collector only deals with Objects and not primitive but still deletes all the primitives in abandoned objects right? )

Correct. A field of an object is part of the object. It goes away when the object is garbage collected. Likewise the cells of an array go away when the array is collected. When the field or cell type is a primitive type, then the value is stored in the field / cell ... which is part of the object / array ... and that has been deleted.

If I use byte instead int, will my loop iterate faster?

No, not at all; if anything, it will be slower, because the underlying hardware generally has instructions for working with the native "int" type (32-bit two's complement integer) but not for working with 8-bit signed bytes.

Any reason to use byte/short etc.. in C#?

A single byte compared to a long won't make a huge difference memory-wise, but when you start having large arrays, these 7 extra bytes will make a big difference.

What's more is that data types help communicate developers' intent much better: when you encounter a byte length; you know for sure that length's range is that of a byte.

byte/short Vs int as for loop counter variable

It is more likely to be confusing than helpful. Most developers expect to see an int value and you only have 32-bit or 64-bit registers in your CPU so it won't change how your program works or performs.

There are many options which work and are not harmful to your program but you need to think about the poor developer who has to read it and understand it later, this could be you 6 months from now. ;)

It is also not worth making such a change even if the performance were faster unless it was dramatically faster. Consider this change.

for (byte i = 1; i <= 120; i++)

or

for (byte i = 1; i <= x; i++)

You might thing this is fine as 200 < 2^8 and it compiles just fine, but it's actually an infinite loop.

You have to ask the question; How much faster does it have to be, if you increase the risk of introducing a bug later?

Usually the answer is it has to make my whole program significantly faster in a way I have measured (not just the bit you change) AND I have to need it to be significantly faster.

Why does the Java API use int instead of short or byte?

Some of the reasons have already been pointed out. For example, the fact that "...(Almost) All operations on byte, short will promote these primitives to int". However, the obvious next question would be: WHY are these types promoted to int?

So to go one level deeper: The answer may simply be related to the Java Virtual Machine Instruction Set. As summarized in the Table in the Java Virtual Machine Specification, all integral arithmetic operations, like adding, dividing and others, are only available for the type int and the type long, and not for the smaller types.

(An aside: The smaller types (byte and short) are basically only intended for arrays. An array like new byte[1000] will take 1000 bytes, and an array like new int[1000] will take 4000 bytes)

Now, of course, one could say that "...the obvious next question would be: WHY are these instructions only offered for int (and long)?".

One reason is mentioned in the JVM Spec mentioned above:

If each typed instruction supported all of the Java Virtual Machine's run-time data types, there would be more instructions than could be represented in a byte

Additionally, the Java Virtual Machine can be considered as an abstraction of a real processor. And introducing dedicated Arithmetic Logic Unit for smaller types would not be worth the effort: It would need additional transistors, but it still could only execute one addition in one clock cycle. The dominant architecture when the JVM was designed was 32bits, just right for a 32bit int. (The operations that involve a 64bit long value are implemented as a special case).

(Note: The last paragraph is a bit oversimplified, considering possible vectorization etc., but should give the basic idea without diving too deep into processor design topics)


EDIT: A short addendum, focussing on the example from the question, but in an more general sense: One could also ask whether it would not be beneficial to store fields using the smaller types. For example, one might think that memory could be saved by storing Calendar.DAY_OF_WEEK as a byte. But here, the Java Class File Format comes into play: All the Fields in a Class File occupy at least one "slot", which has the size of one int (32 bits). (The "wide" fields, double and long, occupy two slots). So explicitly declaring a field as short or byte would not save any memory either.

Why would I use byte, double, long, etc. when I could just use int?

Sometimes, when you're building massive applications that could take up 2+ GB of memory, you really want to be restrictive about what primitive type you want to use. Remember:

  • int takes up 32 bits of memory
  • short takes up 16 bits of memory, 1/2 that of int
  • byte is even smaller, 8 bits.

See this java tutorial about primitive types: http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html

The space taken up by each type really matters if you're handling large data sets. For example, if your program has an array of 1 million ints, then you're taking up 3.81 MB of RAM. Now let's say you know for certain that those 1,000,000 numbers are only going to be in the range of 1-10. Why not, then use a byte array? 1 million bytes only take up 976 Kilobytes, less than 1 MB.

You always want to use the number type that is just "large" enough to fit, just as you wouldn't put an extra-large T-shirt on a newborn baby.



Related Topics



Leave a reply



Submit