A Confusion About Java String Literal Pool and String's Concatenation

Why concatenation of String object and string literal is created in heap?

There is a bunch of dubious information in the comments, so I will give this a proper answer.

  1. There is actually no such thing as "the constant pool". You won't find this term in the Java Language Specification.

    (Perhaps you are getting your terminology confused with the Constant Pool which is the section of a .class file, and the corresponding per-class Runtime Constant Pool ... which is not visible to application programs. These are "specification artifacts" defined by the JVM spec for the purpose of defining the execution model of bytecodes. The spec does not require that they physically exist, though they typically do; e.g. in an Oracle or OpenJDK implementation.)

  2. There is a data structure in a running JVM called the string pool. The string pool is NOT mentioned by name in the JLS, but its existence is implied by string literal properties as specified by the JLS. The string pool is mentioned in the javadocs, and the JVM specification.

  3. The string pool will contain the String objects that represent the values of any string-valued constant expression used in an application. This includes string literals.

  4. The string pool has always been primarily a de-duping mechanism for strings. Applications are able to use this by calling the String.intern method.

  5. The string values in the Constant Pool (see above) are used to create the String objects that the application see:

    • A String object is created from the representation.
    • String.intern is called, returning the corresponding de-duped String object from the string pool.
    • That string becomes part of the classes Runtime Constant Pool; i.e. the Runtime Constant Pool for a class will include a reference to the String object in the string pool.
    • This process can happen eagerly or lazily depending on the Java implementation.
  6. The string pool is and has always been stored in the (or a) heap.

    • Prior to Java 7, string objects in the string pool were allocated in a special heap called the PermGen heap. In the earliest versions of Java it wasn't GC'ed. Then it was GC'ed only occasionally.

    • In Java 7 (not 8!) the string pool stopped using the PermGen heap and used the regular heap instead.

    • In Java 8 the PermGen heap was replaced (for some purposes!) by a different storage management mechanism called the Metaspace. Apparently, Metaspace doesn't hold Java objects. Rather, it holds code segments, class descriptors and other JVM internal data structures.

  7. In recent versions of Java (i.e. Java 8 u20 and later) the GC has another mechanism for de-duping strings that survive a given number of GC cycles.

  8. The behavior of strings (i.e. which ones are interned and which ones are not) is determined by the relevant parts of the JLS and the javadocs for the String class.

  9. All of the complexity is irrelevant if you follow one simple rule:

    Never use == to compare strings. Always use equals.


Now to deal with your example code:

String str1 = "Abc";               // string pool

String str2 = "XYZ"; // string pool

String str3 = str1 + str2; // not string pool (!!)

String str3a = "Abc" + "XYZ"; // string pool

String str4 = new String("PQR"); // not string pool (but the "PQR" literal is)

String str5 = str1.concat(str4); // not string pool

String str6 = str1 + str4; // not string pool

String str7 = str6.intern(); // string pool

Why?

  • The values assigned to str1, str2 and str3a are all values of constant expressions; see below.
  • The value assigned to str3 is not the value of a constant expression according to the JLS.
  • str4 - the JLS says that new operator always creates a new object and new strings are not automatically interned
  • str5 - string operations apart from intern do not create objects in the string pool
  • str6 - ditto - equivalent to a concat call. The JLS also says that + produces a new string (except in the constant expression case).
  • str7 - the exception: see above. The intern call returns a object in the string pool.

Constant expressions include literals, concatenations involving literals, values of static final String constants, and a few other things. See JLS 15.28 for the complete list, but bear in mind that the string pool only holds string values.


The precise behavior of intern depends on the Java version. Consider this example:

char[] chars = // create an array of random characters
String s1 = new String(chars);
String s2 = s1.intern();

Let us assume that the random characters do not correspond to any previously interned string.

  • For older JVMs where interned strings were allocated in PermGen, the intern call in the example will (must) produce a new String object.

  • For newer JVMs, the intern can add the existing String object to the string pool data structure without having to create a new String object.

In other words, the truth of s1 == s2 depends on the Java version.

Confusion in String concatenation operator + in Java

What trips you up is this part of the specification:

The String object is newly created (§12.5) unless the expression is a
constant expression (§15.28).

So when you concatenate a string constant to another string constant, that counts as a constant expression and therefore will be evaluated at compile time and replaced with the string constant "My Computer".

You can verify this by running javap -c on the compiled class.

public class Test {

public static void main(String[] args) {

String s1 = "My Computer";
String s2 = "My" + " Computer";
String s3 = "My";
String s4 = s3 + " Computer";

System.out.println(s1 == s2); //true
System.out.println(s1 == s4); //false
}
}

Which compiles to:

  public static void main(java.lang.String[]);
Code:
// s1 = "My Computer"
0: ldc #2 // String My Computer
2: astore_1

// s2 = "My" + " Computer"
3: ldc #2 // String My Computer
5: astore_2

// s3 = "My"
6: ldc #3 // String My
8: astore_3

// s4 = s3 + " Computer"
9: new #4 // class java/lang/StringBuilder
12: dup
13: invokespecial #5 // Method java/lang/StringBuilder."<
init>":()V
16: aload_3
17: invokevirtual #6 // Method java/lang/StringBuilder.ap
pend:(Ljava/lang/String;)Ljava/lang/StringBuilder;
20: ldc #7 // String Computer
22: invokevirtual #6 // Method java/lang/StringBuilder.ap
pend:(Ljava/lang/String;)Ljava/lang/StringBuilder;
25: invokevirtual #8 // Method java/lang/StringBuilder.to
String:()Ljava/lang/String;
28: astore 4

... the rest of the code omitted

As you can see, the first two assignments (to s1 and s2) load exactly the same constant (#2), and therefore use the same object. Whereas the assignment to s4 is not defined as a constant expression (even though a sufficiently clever compiler could figure it out, it is not allowed to), and therefore you get the whole "create a StringBuilder, append the strings to it, convert the result to a new string" process.

As an interesting aside, if in the code above you add the final modifier to s3, that makes s3 + " Computer" a constant expression again, and both comparisons will print true.

And as no doubt you already know, your code's correctness mustn't rely on all of this, but it's a fun thing to know.

String Concat With Same Reference?

Java automatically interns (means, puts them into the String pool) String literals, not newly created Strings. See also https://stackoverflow.com/a/1855183/1611055.

Remember that Strings are immutable, so the + operator must create a new String - it can not append to the existing one. Internally, the + operator uses a StringBuilder to concatenate the strings. The final result is retrieved through StringBuilder.toString() which essentially does return new String(value, 0, count);.

This newly created String is not automatically put into the String pool.

Hence the str1 reference is different from str even though the strings have the same content. str points to a location in the string pool, while str1 points to a location on the heap.

If you add

str1 = str1.intern();

after str1 = str1 + "abcd"; to explicitly intern the newly created String, your second if statement returns true.

Alternatively, str1 = (str1 + "abcd").intern(); would have the same effect.

Concatenation of two Java strings

From here.

'+' creates a new String object every time it concatenates something, except when the concatenation is done at compile time.

While this is not a reputable source, from what I remember from my textbooks this sounds correct. So basically, applying this to your code:

String ch = "Java " + "is cool";

Would be handled at compile time since you've defined two constants and concatenated them together, which implies that the result is also in fact a constant and thus can be treated as such and calculated at compile time. It would be interesting to see if you compiled that code then decompiled to see how that statement would read, I'd imagine it may read:

String ch = "Java is cool";

As for the other statement:

String ch1 = "Java " ;
String ch2 = "is cool";
String ch3 = ch1 + ch2;

Since ch3 is calculated from ch1 and ch2, it is done at runtime since ch1 and ch2 are variables instead of constants.

As for your first question, I can't find any references exactly, but from what I remember yes the "" implies a string, just like '' implies a character. I'm not exactly sure what you're trying to do with that statement, but I would imagine you could convert your string into a char array and then cast it to an int array.

Does string pool store literals or objects?

Literals are a chunk of source code that is delimited by ". For example, in the following line of source code:

String s = "Hello World";

"Hello World" is a string literal.

Objects are a useful abstraction for a meaningful bits of memory with data that (when grouped together) represents something, whether it be a Car, Person, or String.

The string pool stores String objects rather than String literals, simply because the string pool does not store source code.

You might hear people say "the string pool stores string literals". They (probably) don't mean that the string pool somehow has the source code "Hello World" in it. They (probably) mean that all the Strings represented by string literals in your source code will get put into the string pool. In fact, the Strings produced by constant expressions in your source code also gets added to the string pool automatically.



Related Topics



Leave a reply



Submit