How Java Do the String Concatenation Using "+"

String concatenation: concat() vs + operator

No, not quite.

Firstly, there's a slight difference in semantics. If a is null, then a.concat(b) throws a NullPointerException but a+=b will treat the original value of a as if it were null. Furthermore, the concat() method only accepts String values while the + operator will silently convert the argument to a String (using the toString() method for objects). So the concat() method is more strict in what it accepts.

To look under the hood, write a simple class with a += b;

public class Concat {
String cat(String a, String b) {
a += b;
return a;
}
}

Now disassemble with javap -c (included in the Sun JDK). You should see a listing including:

java.lang.String cat(java.lang.String, java.lang.String);
Code:
0: new #2; //class java/lang/StringBuilder
3: dup
4: invokespecial #3; //Method java/lang/StringBuilder."<init>":()V
7: aload_1
8: invokevirtual #4; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
11: aload_2
12: invokevirtual #4; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
15: invokevirtual #5; //Method java/lang/StringBuilder.toString:()Ljava/lang/ String;
18: astore_1
19: aload_1
20: areturn

So, a += b is the equivalent of

a = new StringBuilder()
.append(a)
.append(b)
.toString();

The concat method should be faster. However, with more strings the StringBuilder method wins, at least in terms of performance.

The source code of String and StringBuilder (and its package-private base class) is available in src.zip of the Sun JDK. You can see that you are building up a char array (resizing as necessary) and then throwing it away when you create the final String. In practice memory allocation is surprisingly fast.

Update: As Pawel Adamski notes, performance has changed in more recent HotSpot. javac still produces exactly the same code, but the bytecode compiler cheats. Simple testing entirely fails because the entire body of code is thrown away. Summing System.identityHashCode (not String.hashCode) shows the StringBuffer code has a slight advantage. Subject to change when the next update is released, or if you use a different JVM. From @lukaseder, a list of HotSpot JVM intrinsics.

How Java do the string concatenation using +?

No. It's not the same using StringBuilder than doing "a" + "b".

In Java, String instances are immutable.

So, if you do:

String c = "a" + "b";

You are creating new Strings every time you concatenate.

On the other hand, StringBuilder is like a buffer that can grow as it needs when appending new Strings.

StringBuilder c = new StringBuilder();
c.append("a");
c.append("b"); // c is only created once and appended "a" and "b".

Rule of the thumb is (changed thanks to the comments I got):

If you are going to concatenate a lot (i.e., concatenate inside a loop, or generating a big XML formed by several string concatenated variables), do use StringBuilder. Otherwise, simple concatenation (using + operator) will be just fine.

Compiler optimizations also play a huge role when compiling this kind of code.

Here'sfurther explanation on the topic.

And more StackOVerflow questions on the issue:

Is it better to reuse a StringBuilder in a loop?

What's the best way to build a string of delimited items in Java?

StringBuilder vs String concatenation in toString() in Java

==' in case of String concatenation in Java

Four things are going on:

  1. (You clearly know this, but for lurkers) == tests to see if the variables point to the same String object, not equivalent strings. So even if x is "foo" and y is also "foo", x == y may be true or false, depending on whether x and y refer to the same String object or different ones. That's why we use equals, not ==, to compare strings for equivalence. All of the following is just meant to explain why == is sometimes true, it's not a suggestion to use == to compare strings. :-)

  2. Equivalent string constants (strings the compiler knows are constants according to various rules in the JLS) within the same class are made to refer to the same string by the compiler (which also lists them in the class's "constant pool"). That's why a == b is true.

  3. When the class is loaded, each of its string constants is automatically interned — the JVM's string pool is checked for an equivalent string and if one is found, that String object is used (if not, the new String object for the new constant is added to the pool). So even if x is a string constant initialized in class Foo and y is a string constant initialized in class Bar, they'll be == each other.

    Points 2 and 3 above are covered in part by JLS§3.10.5. (The bit about the class constant pool is a bit of an implementation detail, hence the link to the JVM spec earlier; the JLS just speaks of interning.)

  4. The compiler does string concatenation if it's dealing with constant values, so

    String d = "dev" + "ender";

    is compiled to

    String d = "devender";

    and "devender" is a string constant the compiler and JVM apply points 2 and 3 above to. E.g., no StringBuilder is used, the concatenation happens at compile-time, not runtime. This is covered in JLS§15.28 - Constant Expressions. So a == d is true for the same reason a == b is true: They refer to the same constant string, so the compiler ensured they were referring to the same string in the class's constant pool.

    The compiler can't do that when any of the operands is not a constant, so it can't do that with:

    String e = c + "ender";

    ...even though code analysis could easily show that the value of c will definitely be "dev" and thus e will definitely be "devender". The specification only has the compiler do the concatenation with constant values, specifically. So since the compiler can't do it, it outputs the StringBuilder code you referred to and that work is done at runtime, creating a new String object. That string isn't automatically interned, so e ends up referring to a different String object than a does, and so a == e is false.

    Note that as Vinod said, if you declared c as final:

    final String c = "dev";

    Then it would be a constant variable (yes, they're really called that) and so §15.28 would apply and the compiler would turn

    String e = c + "ender";

    into

    String e = "devender";

    and a == e would also be true.

Just to reiterate: None of which means we should use == to compare strings for equivalence. :-) That's what equals is for.

How do I concatenate two strings in Java?

You can concatenate Strings using the + operator:

System.out.println("Your number is " + theNumber + "!");

theNumber is implicitly converted to the String "42".

When to use String#concat() method in Java?

String#concat and + exist to provide a minimalistic set of operations on the type String.

They are not efficient if used multiple times.

But they have their own right as type operations "xxx" + "yyy" you do not want to specify using a StringBuilder. (Furthermore there it is a compile time concatenation.)

StringBuffer is a mistake IMHO. It is slower that the newer StringBuilder as it is synchronized, but one would rarely add something rom two threads (unordered).

String::concat may be a method reference useful for stream reduction or such.

java string concatenation and interning

The first part of your question is simple: Java compiler treats concatenation of multiple string literals as a single string literal, i.e.

"I Love" + " Java"

and

"I Love Java"

are two identical string literals, which get properly interned.

The same interning behavior does not apply to += operation on strings, so b1 and b2 are actually constructed at run-time.

The second part is trickier. Recall that b1.intern() may return b1 or some other String object that is equal to it. When you keep a1 and a2, you get a1 back from the call to b1.intern(). When you comment out a1 and a2, there is no existing copy to be returned, so b1.intern() gives you back b1 itself.

How does the concatenation of a String with characters work in Java?

str.charAt(i) returns a char, adding two chars results in a char with a codepoint equal to the sum of the input codepoints. When you start with str +, the first concatenation is between a String and a char, which results in a String, followed by the second concatenation, also between a String and a char.

You can fix this a few ways, such as:

str1 += String.valueOf(str.charAt(i)) + str.charAt(i);

or

str1 += "" + str.charAt(i) + str.charAt(i);

or, as you've already discovered, and likely the most readable:

str1 = str1 + str.charAt(i) + str.charAt(i);

How Java String pool works when String concatenation?

"When the string is created by concatenation does java make something
different or simple == comparator have another behaviour?"

No it does not change its behavior, what happens is that:

When concatenating two string literals "a" + "b" the jvm joins the two values and then check the string pool, then it realizes the value already exists in the pool so it just simply assign this reference to the String. now in more details:

Look at the compiled bytecode below of this simple program:

public class Test  {    
public static void main(String... args) {
String a = "hello world!";
String b = "hello" + " world!";
boolean compare = (a == b);
}
}

Simple program

First the JVM loads the string "hello world! and then push it to string pool (in this case) and then loads it to the stack (ldc = Load constant) [see point 1 in Image]

Then it assign the reference created in the pool to the local variable (astore_1) [see point 2 in Image]

Notice that the reference created in the string pool for this literal is #2 [See point 3 in Image]

The next operation is about the same: in concatenates the string, push it to the runtime constant pool (string pool in this case), but then it realizes a literal with the same content already exists so it uses this reference (#2) and assign in to a local variable (astore_2).

Thus when you do (a == b) is true because both of them are referencing to the string pool #2 which is "hello world!".

Your example C is kind of different tho, because you're using the += operator which when compiled to bytecode it uses StringBuilder to concatenate the strings, so this creates a new instance of StringBuilder Object thus pointing to a different reference. (string pool vs Object)



Related Topics



Leave a reply



Submit