When Should We Use Intern Method of String on String Literals

When should we use intern method of String on String literals

Java automatically interns String literals. This means that in many cases, the == operator appears to work for Strings in the same way that it does for ints or other primitive values.

Since interning is automatic for String literals, the intern() method is to be used on Strings constructed with new String()

Using your example:

String s1 = "Rakesh";
String s2 = "Rakesh";
String s3 = "Rakesh".intern();
String s4 = new String("Rakesh");
String s5 = new String("Rakesh").intern();

if ( s1 == s2 ){
System.out.println("s1 and s2 are same"); // 1.
}

if ( s1 == s3 ){
System.out.println("s1 and s3 are same" ); // 2.
}

if ( s1 == s4 ){
System.out.println("s1 and s4 are same" ); // 3.
}

if ( s1 == s5 ){
System.out.println("s1 and s5 are same" ); // 4.
}

will return:

s1 and s2 are same
s1 and s3 are same
s1 and s5 are same

In all the cases besides of s4 variable, a value for which was explicitly created using new operator and where intern method was not used on it's result, it is a single immutable instance that's being returned JVM's string constant pool.

Refer to JavaTechniques "String Equality and Interning" for more information.

Difference between String intern method and normal string creation

The difference is that the initialization way of the variable that decides where to save the variable ;

  1. if it has the same value and same initialization method and initialized using new keyword - it will save it in heap and will save each variable as new object even if it has same value.
  2. if it has the same value and same initialization method and initialized directly - it will reference it in JVM pooled memory .

String Interning Oracle reference .

There are two ways to construct a string: implicit construction by assigning a string literal or explicitly creating a String object via the new operator and constructor. For example

String s1 = "Hello";              // String literal
String s2 = "Hello"; // String literal
String s3 = s1; // same reference
String s4 = new String("Hello"); // String object
String s5 = new String("Hello"); // String object

Java has provided a special mechanism for keeping the String literals - in a so-called string common pool. If two string literals have the same contents, they will share the same storage inside the common pool. This approach is adopted to conserve storage for frequently-used strings. On the other hand, String objects created via the new operator and constructor are kept in the heap. Each String object in the heap has its own storage just like any other object .

Sample Image

s1 == s1;         // true, same pointer
s1 == s2; // true, s1 and s1 share storage in common pool
s1 == s3; // true, s3 is assigned same pointer as s1
s1.equals(s3); // true, same contents
s1 == s4; // false, different pointers
s1.equals(s4); // true, same contents
s4 == s5; // false, different pointers in heap
s4.equals(s5); // true, same contents

Important Notes:

  1. In the above example, I used relational equality operator '==' to compare the references of two String objects. This is done to demonstrate the differences between string literals sharing storage in the common pool and String objects created in the heap. It is a logical error to use (str1 == str2) in your program to compare the contents of two Strings.
  2. String can be created by directly assigning a String literal which is shared in a common pool. It is uncommon and not recommended to use the new operator to construct a String object in the heap.

When to use intern() on String literals

This is a technique to ensure that CONSTANT is not actually a constant.

When the Java compiler sees a reference to a final static primitive or String, it inserts the actual value of that constant into the class that uses it. If you then change the constant value in the defining class but don't recompile the using class, it will continue to use the old value.

By calling intern() on the "constant" string, it is no longer considered a static constant by the compiler, so the using class will actually access the defining class' member on each use.


JLS citations:

  • definition of a compile-time constant: http://docs.oracle.com/javase/specs/jls/se6/html/expressions.html#5313

  • implication of changes to a compile-time constant (about halfway down the page): http://docs.oracle.com/javase/specs/jls/se6/html/binaryComp.html#45139

What is the purpose of Java's String.intern()?

There are essentially two ways that our String objects can enter in to the pool:

  • Using a literal in source code like "bbb".
  • Using intern.

intern is for when you have a String that's not otherwise from the pool. For example:

String bb = "bbb".substring(1); // substring creates a new object

System.out.println(bb == "bb"); // false
System.out.println(bb.intern() == "bb"); // true

Or slightly different:

System.out.println(new String("bbb").intern() == "bbb"); // true

new String("bbb") does create two objects...

String fromLiteral = "bbb";                     // in pool
String fromNewString = new String(fromLiteral); // not in pool

...but it's more like a special case. It creates two objects because "bbb" refers to an object:

A string literal is a reference to an instance of class String [...].

Moreover, a string literal always refers to the same instance of class String.

And new String(...) creates a copy of it.

However, there are many ways String objects are created without using a literal, such as:

  • All the String methods that perform some kind of mutation. (substring, split, replace, etc.)
  • Reading a String from some kind of input such as a Scanner or Reader.
  • Concatenation when at least one operand is not a compile-time constant.

intern lets you add them to the pool or retrieve an existing object if there was one. Under most circumstances interning Strings is unnecessary but it can be used as an optimization because:

  • It lets you compare with ==.
  • It can save memory because duplicates can be garbage collected.

Java string intern and literal

They have the same end result, but they are not the same (they'll produce different bytecode; the new String("foo").intern() version actually goes through those steps, producing a new string object, then interning it).

Two relevant quotes from String#intern:

When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned.

All literal strings and string-valued constant expressions are interned.

So the end result is the same: A variable referencing the interned string "foo".

What is Java String interning?

http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#intern()

Basically doing String.intern() on a series of strings will ensure that all strings having same contents share same memory. So if you have list of names where 'john' appears 1000 times, by interning you ensure only one 'john' is actually allocated memory.

This can be useful to reduce memory requirements of your program. But be aware that the cache is maintained by JVM in permanent memory pool which is usually limited in size compared to heap so you should not use intern if you don't have too many duplicate values.


More on memory constraints of using intern()

On one hand, it is true that you can remove String duplicates by
internalizing them. The problem is that the internalized strings go to
the Permanent Generation, which is an area of the JVM that is reserved
for non-user objects, like Classes, Methods and other internal JVM
objects. The size of this area is limited, and is usually much smaller
than the heap. Calling intern() on a String has the effect of moving
it out from the heap into the permanent generation, and you risk
running out of PermGen space.

--
From: http://www.codeinstructions.com/2009/01/busting-javalangstringintern-myths.html


From JDK 7 (I mean in HotSpot), something has changed.

In JDK 7, interned strings are no longer allocated in the permanent generation of the Java heap, but are instead allocated in the main part of the Java heap (known as the young and old generations), along with the other objects created by the application. This change will result in more data residing in the main Java heap, and less data in the permanent generation, and thus may require heap sizes to be adjusted. Most applications will see only relatively small differences in heap usage due to this change, but larger applications that load many classes or make heavy use of the String.intern() method will see more significant differences.

-- From Java SE 7 Features and Enhancements

Update: Interned strings are stored in main heap from Java 7 onwards. http://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html#jdk7changes

Why String intern method?

The basic algorithm for .intern() is the following:

  1. Create a hash set of Strings
  2. Check to see if the String you're dealing with is already in the set
  3. If so, return the one from the set
  4. Otherwise, add this string to the set and return it

So it basically used to find the given string exist into the pool if it exist then it will get the same instance for that otherwise it creates the new instance for the new String.

Does String.Intern method just add a reference to a string to the intern pool or it creates a copy of the string?

This is easy enough to test directly:

// Make string from char[] to ensure it's not already interned
string s1 = new string(new[] { 'H', 'e', 'l', 'l', 'o' });
string i1 = string.Intern(s1);
bool result1 = object.ReferenceEquals(s1, i1);

string s2 = new string(new[] { 'H', 'e', 'l', 'l', 'o' });
string i2 = string.Intern(s2);
bool result2 = object.ReferenceEquals(s2, i2);

Note that result1 is set to true, showing that the original string object is not copied. On the other hand, result2 is set to false, showing that the second constructed string object "Hello" was found in the intern pool, and so the string.Intern() method returns that interned instance instead of the newly constructed instance passed in.

The string.Intern() method does not copy strings. It just checks whether the reference passed is equal to a string already in the pool, and adds it to the pool if it's not.

The return of String.intern() explained

s2.intern() would return the instance referenced by s2 only if the String pool didn't contain a String whose value is "java" prior to that call. The JDK classes intern some Strings before your code is executed. "java" must be one of them. Therefore, s2.intern() returns the previously interned instance instead of s2.

On the other hand, the JDK classes did not intern any String whose value is equal to "Cattie & Doggie", so s1.intern() returns s1.

I am not aware of any list of pre-interned Strings. Such a list will most likely be considered an implementation detail, which may vary on different JDK implementations and JDK versions, and should not be relied on.

Is it good practice to use java.lang.String.intern()?


When would I use this function in favor to String.equals()

when you need speed since you can compare strings by reference (== is faster than equals)

Are there side effects not mentioned in the Javadoc?

The primary disadvantage is that you have to remember to make sure that you actually do intern() all of the strings that you're going to compare. It's easy to forget to intern() all strings and then you can get confusingly incorrect results. Also, for everyone's sake, please be sure to very clearly document that you're relying on the strings being internalized.

The second disadvantage if you decide to internalize strings is that the intern() method is relatively expensive. It has to manage the pool of unique strings so it does a fair bit of work (even if the string has already been internalized). So, be careful in your code design so that you e.g., intern() all appropriate strings on input so you don't have to worry about it anymore.

(from JGuru)

Third disadvantage (Java 7 or less only): interned Strings live in PermGen space, which is usually quite small; you may run into an OutOfMemoryError with plenty of free heap space.

(from Michael Borgwardt)



Related Topics



Leave a reply



Submit