Is It Good Practice to Use Java.Lang.String.Intern()

Is it good practice to use java.lang.String.intern()?

When would I use this function in favor to String.equals()

when you need speed since you can compare strings by reference (== is faster than equals)

Are there side effects not mentioned in the Javadoc?

The primary disadvantage is that you have to remember to make sure that you actually do intern() all of the strings that you're going to compare. It's easy to forget to intern() all strings and then you can get confusingly incorrect results. Also, for everyone's sake, please be sure to very clearly document that you're relying on the strings being internalized.

The second disadvantage if you decide to internalize strings is that the intern() method is relatively expensive. It has to manage the pool of unique strings so it does a fair bit of work (even if the string has already been internalized). So, be careful in your code design so that you e.g., intern() all appropriate strings on input so you don't have to worry about it anymore.

(from JGuru)

Third disadvantage (Java 7 or less only): interned Strings live in PermGen space, which is usually quite small; you may run into an OutOfMemoryError with plenty of free heap space.

(from Michael Borgwardt)

java 6 call string intern method a good idea at app start up?

Interning the empty string (or any other string for that matter) does not give you any guarantee that subsequent instance will use the same object unless you intern them too.

The contract is that 2 Strings reference the same object (and thus are ==) if they are equals() and have both been interned.

Besides interning has a cost, each intern() implies a search in the pool.

How does String.intern() work and how does it affect the String pool?

when i call test.intern() what this intern method will do?

It will put the "dog" string in the string pool (unless it's already there). But it will not necessarily put the object that test refers to in the pool. This is why you typically do

test = test.intern();

Note that if you have a "dog" literal in your code, then there will already be a "dog" in the string pool, so test.intern() will return that object.

Perhaps your experiment confuses you, and it was in fact the following experiment you had in mind:

String s1 = "dog";             // "dog" object from string pool
String s2 = new String("dog"); // new "dog" object on heap

System.out.println(s1 == s2); // false

s2 = s2.intern(); // intern() returns the object from string pool

System.out.println(s1 == s2); // true

Java: How exactly String intern() and StringPool works?

  1. All string literals are interned on compilation time. Using a string literal with the single argument constructor taking a string is a bit of an abuse of that constructor, hence you are likely to get two of them (but maybe there is a special compiler case for this, I can't say for sure). As of java 8 the implementation of the constructor (for openjdk) is this:
public String(String original) {
this.value = original.value;
this.hash = original.hash;
}

So no special treatment on this side. If you know the literal don't use this constructor.


  1. I don't think there is any special GC semantics for Strings. It will get collected once it's unreachable and deemed collection worthy by the GC as any other object.

  2. Don't ever use == for comparing strings, the first step in the default equals method for Strings is doing just that. If this is your dominant case (you know you are working with interned strings most of the time) you are only paying the overhead of a method call which is tiny, the potential for future bugs you add by doing something like that is just too big of a risk for a gain that is minuscule.

Java String.intern() use HashTable instead of ConcurrentHashMap

There are a number of things going on here:

  1. Your benchmarks have very large error bars. The repeat counts are probably too small. This makes the results questionable.

  2. It doesn't look like your benchmarks are resetting the "interned string" caches after each run1. So that means that the caches are growing, and each repetition will be starting with different conditions. This may explain the error bars ...

  3. Your ConcurrentHashMap is not functionally equivalent to String::intern. The latter uses a native equivalent to Reference objects to ensure that interned strings can be garbage collected. Your ConcurrentHashMap implementation doesn't. Why does this matter?

    • Your ConcurrentHashMap is a massive memory leak.
    • A reference mechanism is expensive ... at GC time. (Though possibly less expensive2 than a memory leak.)


String.intern() slower than ConcurrentHashMap because String.intern() is native HashTable implementation.

No. The real reason is that the native implementation is doing things differently:

  • The internal representations are different. The native (intern) string pool uses a custom hash table implemented in native code.
  • It has to handle references which impacts on GC performance.
  • There are also behind-the-scenes interactions with string deduping and other things.

Note that these things vary considerably across different Java versions.

This is very confusing situation. It recommend ConcurrentHashMap, but it using HashTable although performance penalty.

Now you are talking about a different scenario, that is not relevant to what you are doing.

  • Note that String::intern doesn't use either HashTable or HashMap; see above.

  • The quote that you found is about how to get good concurrent performance from a hash table. Your benchmark is (AFAIK) single threaded. For a serial use-use case, HashMap will give better performance than the others.

Does anyone have any idea about why used native HashTable implementation instance of ConcurrentHashMap ?

It doesn't use a hash table; see above. There are a number of reason that it doesn't HashTable or HashMap or ConcurrentHashMap:

  • It is that it is paying more attention to memory utilization. All of the Java hash table implementations are memory hungry and that makes them unsuitable for general purpose string interning.
  • The memory and CPU overheads of using Reference classes are significant.
  • Computing a hash of a newly created string of length N is O(N) which will be significant when interning strings that may be hundreds / thousands of characters long.

Finally, be carefully that you are not focusing on the wrong problem here. If you are trying to optimize interning because it is a bottleneck in your application, the other strategy is to not intern at all. In practice, it rarely saved memory (especially compared with G1GC's string de-duping) and rarely improves string handling performance.


In summary:

  • You are comparing apples and oranges. Your map-base implementation is not equivalent to native interning.
  • String::intern is not optimized solely (even primarily) for speed.
  • By focusing on speed, you are ignoring memory utilization ... and the secondary effect of memory utilization on speed.
  • Consider the potential optimization of not interning at all.

1 - And in the native intern case, I don't think that is possible.

2 - A Java memory leak in the regular heap impacts on long-term GC performance because the retained objects need to be repeatedly marked and copied by the GC. There may be secondary effects too.

The return of String.intern() explained

s2.intern() would return the instance referenced by s2 only if the String pool didn't contain a String whose value is "java" prior to that call. The JDK classes intern some Strings before your code is executed. "java" must be one of them. Therefore, s2.intern() returns the previously interned instance instead of s2.

On the other hand, the JDK classes did not intern any String whose value is equal to "Cattie & Doggie", so s1.intern() returns s1.

I am not aware of any list of pre-interned Strings. Such a list will most likely be considered an implementation detail, which may vary on different JDK implementations and JDK versions, and should not be relied on.

String.intern() how exactly work

You have to do:

if (args0.intern() == args1.intern()){
System.out.println("Success");
}


Related Topics



Leave a reply



Submit