Integers Caching in Java

Integers caching in Java

I want to understand purposes of this
optimization. In what cases
performance is increased, etc.
Reference to some research of this
problem will be great.

The purpose is mainly to save memory, which also leads to faster code due to better cache efficiency.

Basically, the Integer class keeps a cache of Integer instances in the range of -128 to 127, and all autoboxing, literals and uses of Integer.valueOf() will return instances from that cache for the range it covers.

This is based on the assumption that these small values occur much more often than other ints and therefore it makes sense to avoid the overhead of having different objects for every instance (an Integer object takes up something like 12 bytes).

Why Integer class caching values in the range -128 to 127?

Just wondering, why between -128 and 127?

A larger range of integers may be cached, but at least those between -128 and 127 must be cached because it is mandated by the Java Language Specification (emphasis mine):

If the value p being boxed is true, false, a byte, or a char in the range \u0000 to \u007f, or an int or short number between -128 and 127 (inclusive), then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.

The rationale for this requirement is explained in the same paragraph:

Ideally, boxing a given primitive value p, would always yield an identical reference. In practice, this may not be feasible using existing implementation techniques. The rules above are a pragmatic compromise. The final clause above requires that certain common values always be boxed into indistinguishable objects. [...]

This ensures that in most common cases, the behavior will be the desired one, without imposing an undue performance penalty, especially on small devices. Less memory-limited implementations might, for example, cache all char and short values, as well as int and long values in the range of -32K to +32K.


How can I cache other values outside of this range.?

You can use the -XX:AutoBoxCacheMax JVM option, which is not really documented in the list of available Hotspot JVM Options. However it is mentioned in the comments inside the Integer class around line 590:

The size of the cache may be controlled by the -XX:AutoBoxCacheMax=<size> option.

Note that this is implementation specific and may or may not be available on other JVMs.

Integer caching in Java with new operator

Explanation

When you compare Integer vs int with ==, it needs to convert the Integer to an int. This is called unboxing.

See JLS§5.1.8:

If r is a reference of type Integer, then unboxing conversion converts r into r.intValue()

At that point, you are comparing int vs int. And primitives have no notion of instances, they all refer to the same value. As such, the result is true.

So the actual code you have is

a.intValue() == c

leading to a comparison of 10 == 10, both int values, no Integer instances anymore.

You can see that new Integer(...) indeed creates new instances, when you compare Integer vs Integer. You did that in a == b.


Note

The constructor new Integer(...) is deprecated. You should instead use Integer#valueOf, it is potentially faster and also uses an internal cache. From the documentation:

Returns an Integer instance representing the specified int value. If a new Integer instance is not required, this method should generally be used in preference to the constructor Integer(int), as this method is likely to yield significantly better space and time performance by caching frequently requested values. This method will always cache values in the range -128 to 127, inclusive, and may cache other values outside of this range.

The caching is important to note here, since it yields to == being true again (for cached values):

Integer first = Integer.valueOf(10);
Integer second = Integer.valueOf(10);
System.out.println(first == second); // true

The caching is guaranteed for values between -128 and +127, but may also be used for others.

Also note that your b actually comes out of the cache, since

Integer b = 10;
// same as
Integer b = Integer.valueOf(10);
// and not
Integer b = new Integer(10);

So boxing goes through Integers cache (see JLS§5.1.7).

Caching of boxed values in Java

Because these assignments are made using the 'new' keyword, they are new instances created, and not picked up from the pool. Also note that the integer pool contains only integers from -128 to 127. For higher values, pool do not come to play.

Integer third = 4;
Integer fourth = 4;
Integer fifth = 130;
Integer sixth = 130;
Integer seventh = 127;
Integer eighth = 127;
System.out.println(third == fourth); // 4==4 Return true
System.out.println(fifth == sixth); // 130==130 Returns false
System.out.println(seventh == eighth); // 127==127 Returns true

edit
As rightly mentioned in the comment, from Java 1.6 auto box limit can be extended by using -XX:AutoBoxCacheMax=new_limit from command line.

Why aren't Integers cached in Java?

It should be very clear that caching has an unacceptable performance hit -- an extra if statement and memory lookup every time you create an Integer. That alone overshadows any other reason and the rest of the agonizing on this thread.

As far as responding "correctly" to ==, the OP is mistaken in his assumption of correctness. Integers DO respond correctly to == by the general Java community's expectation of correctness and of course by the specification's definition of correctness. That is, if two references point to the same object, they are ==. If two references point to different objects, they are not == even if they have the same contents. Thus, it should be no surprise that new Integer(5) == new Integer(5) evaluates to false.

The more interesting question is why new Object(); should be required to create a unique instance every time? i. e. why is new Object(); not allowed to cache? The answer is the wait(...) and notify(...) calls. Caching new Object()s would incorrectly cause threads to synchronize with each other when they shouldn't.

If it were not for that, then Java implementations could totally cache new Object()s with a singleton.

And that should explain why new Integer(5) done 7 times must be required to create 7 unique Integer objects each containing the value 5 (because Integer extends Object).


Secondary, Less Important Stuff: One problem in this otherwise nice scheme results from the autoboxing and autounboxing feature. Without the feature you could not do comparisons such as new Integer(5) == 5. To enable these, Java unboxes the object (and does not box the primitive). Therefore new Integer(5) == 5 is converted to: new Integer(5).intValue() == 5 (and not new Integer(5) == new Integer(5).

One last thing to understand is that autoboxing of n is not done by new Integer(n). It is done internally by a call to Integer.valueOf(n).

If you think you understand and want to test yourself, predict the output of the following program:

public class Foo {
public static void main (String[] args) {
System.out.println(Integer.valueOf(5000) == Integer.valueOf(5000));
System.out.println(Integer.valueOf(5000) == new Integer(5000));
System.out.println(Integer.valueOf(5000) == 5000);
System.out.println(new Integer(5000) == Integer.valueOf(5000));
System.out.println(new Integer(5000) == new Integer(5000));
System.out.println(new Integer(5000) == 5000);
System.out.println(5000 == Integer.valueOf(5000));
System.out.println(5000 == new Integer(5000));
System.out.println(5000 == 5000);
System.out.println("=====");
System.out.println(Integer.valueOf(5) == Integer.valueOf(5));
System.out.println(Integer.valueOf(5) == new Integer(5));
System.out.println(Integer.valueOf(5) == 5);
System.out.println(new Integer(5) == Integer.valueOf(5));
System.out.println(new Integer(5) == new Integer(5));
System.out.println(new Integer(5) == 5);
System.out.println(5 == Integer.valueOf(5));
System.out.println(5 == new Integer(5));
System.out.println(5 == 5);
System.out.println("=====");
test(5000, 5000);
test(5, 5);
}
public static void test (Integer a, Integer b) {
System.out.println(a == b);
}
}

For extra credit, also predict the output if all the == are changed to .equals(...)

Update: Thanks to comment from user @sactiw : "default range of cache is -128 to 127 and java 1.6 onward you can reset the upper value >=127 by passing -XX:AutoBoxCacheMax= from command line"

What are the benefits of IntegerCache while using Integer?

Values between -128 and 127 are cached for reuse. This is an example of the flyweight pattern, which minimizes memory usage by reusing immutable objects.


Beyond being an optimization, this behaviour is part of the JLS, so the following may be relied upon:

Integer a = 1;
Integer b = 1;
Integer c = 999;
Integer d = 999;
System.out.println(a == b); // true
System.out.println(c == d); // false

Integer object caching

You're misunderstanding what "the same object will be returned" means.

So, comparison with == is actually comparing memory locations, and returns true only when the two variables hold the same object (i.e. stored at the same memory location).

Values between -128 to 127 are stored in the integer constant pool, which means that every 10 is the same 10 (i.e. the same memory location), every 12 is the same 12, etc. But it's not the case that all 10s are also 12s, which is what your question is unintentionally assuming.

Anyway, once you get outside of that range, each primitive int is a new object and is assigned to a new memory location outside of the constant pool.

You can test that with the following code:

public static void main(String[] args) {

Integer a = 1000;
Integer b = 1000;
if(a == b)
System.out.println("same");
else
System.out.println("Not");
}

That will print "Not", because a and b are two different objects stored in different memory locations.

And this is why you should compare things with .equals()

How large is the Integer cache?

Internal Java implementation and could not be configured, the range is from -128 to 127. You can check Javadocs or simply take a look at sources:

public static Integer valueOf(int i) {
final int offset = 128;
if (i >= -128 && i <= 127) { // must cache
return IntegerCache.cache[i + offset];
}
return new Integer(i);
}

UPD. I was wrong (thx to Marco Topolnik). All of the above is related to older Java implementations. For Java 7 implementation could be achieved with system property:

-Djava.lang.Integer.IntegerCache.high=<size>

or JVM setting:

-XX:AutoBoxCacheMax=<size>

UPD. 2 java.math.BigInteger has hardcoded cache for values -16 <= x <= 16. From sources:

    private final static int MAX_CONSTANT = 16;
private static BigInteger posConst[] = new BigInteger[MAX_CONSTANT+1];
private static BigInteger negConst[] = new BigInteger[MAX_CONSTANT+1];
static {
for (int i = 1; i <= MAX_CONSTANT; i++) {
int[] magnitude = new int[1];
magnitude[0] = i;
posConst[i] = new BigInteger(magnitude, 1);
negConst[i] = new BigInteger(magnitude, -1);
}
}

public static BigInteger valueOf(long val) {
// If -MAX_CONSTANT < val < MAX_CONSTANT, return stashed constant
if (val == 0)
return ZERO;
if (val > 0 && val <= MAX_CONSTANT)
return posConst[(int) val];
else if (val < 0 && val >= -MAX_CONSTANT)
return negConst[(int) -val];
return new BigInteger(val);
}

How/why is Integer caching faster in Java?

The performance benefit is nothing to do with ==. The real reason is that the cache allows valueOf to avoid creating lots of objects when the same "small" integers are boxed repeatedly. Integer objects occupy space (at least 16 bytes). Creating lots of them means that the garbage collector needs to run more frequently.

My understanding is that the Java team did a lot of analysis of real-world applications, and came to the conclusion that the Integer cache was a worthwhile optimization.

(The reason I say that it is nothing to do with == is that you cannot rely on == working for Integer objects. Therefore, any code that does this ... without also ensuring that the integers are in the specified range ... is buggy. And if you do a range check to ensure that the numbers are in-range, you are likely to spend more on that than you save by using ==.)


"... which also leads to faster code due to better cache efficiency." How does it lead to faster code?

The cache efficiency they are talking about is the hardware level memory cache that most modern processors use to deal with the mismatch between CPU and memory speeds. If the application has only one Integer object representing the number 1 and that number appears frequently, then the chances that the memory addresses holding the 1 object are in the (memory) caches are increased. Cache hits means faster code.


Does it have anything to do with the following method found in the IntegerCache class?

Erm ... yes.



Related Topics



Leave a reply



Submit