How is String concatenation implemented in Java 9?
The "old" way output a bunch of StringBuilder
-oriented operations. Consider this program:
public class Example {
public static void main(String[] args)
{
String result = args[0] + "-" + args[1] + "-" + args[2];
System.out.println(result);
}
}
If we compile that with JDK 8 or earlier and then use javap -c Example
to see the bytecode, we see something like this:
public class Example {
public Example();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: new #2 // class java/lang/StringBuilder
3: dup
4: invokespecial #3 // Method java/lang/StringBuilder."<init>":()V
7: aload_0
8: iconst_0
9: aaload
10: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
13: ldc #5 // String -
15: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
18: aload_0
19: iconst_1
20: aaload
21: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
24: ldc #5 // String -
26: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
29: aload_0
30: iconst_2
31: aaload
32: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
35: invokevirtual #6 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
38: astore_1
39: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream;
42: aload_1
43: invokevirtual #8 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
46: return
}
As you can see, it creates a StringBuilder
and uses append
. This is famous fairly inefficient as the default capacity of the built-in buffer in StringBuilder
is only 16 chars, and there's no way for the compiler to know to allocate more in advance, so it ends up having to reallocate. It's also a bunch of method calls. (Note that the JVM can sometimes detect and rewrite these patterns of calls to make them more efficient, though.)
Let's look at what Java 9 generates:
public class Example {
public Example();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: aload_0
1: iconst_0
2: aaload
3: aload_0
4: iconst_1
5: aaload
6: aload_0
7: iconst_2
8: aaload
9: invokedynamic #2, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
14: astore_1
15: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
18: aload_1
19: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
22: return
}
Oh my but that's shorter. :-) It makes a single call to makeConcatWithConstants
from StringConcatFactory
, which says this in its Javadoc:
Methods to facilitate the creation of String concatenation methods, that can be used to efficiently concatenate a known number of arguments of known types, possibly after type adaptation and partial evaluation of arguments. These methods are typically used as bootstrap methods for
invokedynamic
call sites, to support the string concatenation feature of the Java Programming Language.
When to use String#concat() method in Java?
String#concat
and +
exist to provide a minimalistic set of operations on the type String.
They are not efficient if used multiple times.
But they have their own right as type operations "xxx" + "yyy"
you do not want to specify using a StringBuilder. (Furthermore there it is a compile time concatenation.)
StringBuffer
is a mistake IMHO. It is slower that the newer StringBuilder
as it is synchronized, but one would rarely add something rom two threads (unordered).
String::concat
may be a method reference useful for stream reduction or such.
Can't understand how Java string literals are implemented
15.18.1. String Concatenation Operator +
An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.
In your case,
String s1 = "hello";
String s2 ="bc";
int value = 22;
String r = s1 + s2 + value;
you will get
INVOKESPECIAL java/lang/StringBuilder.<init> ()V
ALOAD 1
INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
ALOAD 2
INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
ILOAD 3
INVOKEVIRTUAL java/lang/StringBuilder.append (I)Ljava/lang/StringBuilder;
INVOKEVIRTUAL java/lang/StringBuilder.toString ()Ljava/lang/String;
While concatinating constant objects
String r = "hello" + "bc" + 22;
you will get
LDC "hellobc22"
ASTORE 2
When the compiler "meets" this code:
String s = s1+s2+22;
it "changes" it to:
String s = new StringBuilder().append("hello").append("bc").append(22).toString();
No. It may optimise it to
String s = new StringBuilder().append(s1).append(s2).append(value).toString();
but it can't replace s1
with "hello"
because there is no guarantee the variable will keep referring to "hello"
. The variable is not final and thus open to reassignment.
String concatenation with the + symbol
The rule
“do not concatenate Strings with + !!!“
is wrong, because it is incomplete and therefore misleading.
The rule is
do not concatenate Strings with + in a loop
and that rule still holds. The original rule was never meant to be applied outside of loops!
A simple loop
String s = "";
for (int i = 0; i < 10000; i++) { s += i; }
System.out.println(s);
is still much still much slower than
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 10000; i++) { sb.append(i); }
System.out.println(sb.toString());
because the Java compiler has to translate the first loop into
String s = "";
for (int i = 0; i < 1000; i++) { s = new StringBuilder(s).append(i).toString(); }
System.out.println(s);
Also the claim
Today the JVM compiles the + symbol into a string builder (in most cases).
is misleading at least, because this translation was already done with Java 1.0 (ok, not with StringBuilder but with StringBuffer, because StringBuilder was only added with Java5).
One could also argue that the claim
Today the JVM compiles the + symbol into a string builder (in most cases).
is simply wrong, because the compilation is not done by the JVM. It is done by the Java Compiler.
For the question: when does the Java compiler use StringBuilder.append()
and when does it use some other mechanism?
The source code of the Java compiler (version 1.8) contains two places where String concationation through the +
operator is handled.
- the first place is String constant folding (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/com/sun/tools/javac/comp/ConstFold.java?av=f#314). In this case the compiler can calculate the resulting string and works with the resulting string.
- the second place is where the compiler creates the code for assignment operations (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/com/sun/tools/javac/jvm/Gen.java?av=f#2056). In this case the compiler always emits code to create a
StringBuilder
The conclusion is that for the Java compiler from the OpenJDK (which means the compiler distributed by Oracle) the phrase in most cases means always. (Though this could change with Java 9, or it could be that another Java compiler like the one that is included within Eclipse uses some other mechanism).
How can the methods `makeConcat` and `makeConcatWithConstants` in `StringConcatFactory` used by directly calling the API?
You are not supposed to call this API directly. The class has been designed to provide bootstrap methods for an invokedynamic
instruction, so its API is straight-forward for that use case, but not for direct invocations.
But the documentation is exhaustive:
Parameters
lookup
- Represents a lookup context with the accessibility privileges of the caller. When used withinvokedynamic
, this is stacked automatically by the VM.name
- The name of the method to implement. This name is arbitrary, and has no meaning for this linkage method. When used withinvokedynamic
, this is provided by theNameAndType
of theInvokeDynamic
structure and is stacked automatically by the VM.concatType
- The expected signature of the CallSite. The parameter types represent the types of concatenation arguments; the return type is always assignable fromString
. When used withinvokedynamic
, this is provided by theNameAndType
of theInvokeDynamic
structure and is stacked automatically by the VM.
Emphasis added by me
Note how all parameters are normally provided by the JVM automatically on the basis of the invokedynamic
bytecode instruction. In this context, it’s a single instruction consuming some arguments and producing a String
, referring to this bootstrap method as the entity knowing how to do the operation.
When you want to invoke it manually, for whatever reasons, you’d have to do something like
String arg1 = "Hello";
char arg2 = ' ';
String arg3 = "StringConcatFactory";
MethodHandle mh = StringConcatFactory.makeConcat(
MethodHandles.lookup(), // normally provided by the JVM
"foobar", // normally provided by javac, but meaningless here
// method type is normally provided by the JVM and matches the invocation
MethodType.methodType(String.class, String.class, char.class, String.class))
.getTarget();
// we can now use the handle to perform a concatenation
// the argument types must match the MethodType specified above
String result = (String)mh.invokeExact(arg1, arg2, arg3);
System.out.println(result);
You could re-use the MethodHandle
for multiple string concatenations, but your are bound to the parameter types you’ve specified during the bootstrapping.
For ordinary string concatenation expressions, each expression gets linked during its bootstrapping to a handle matching the fixed number of subexpressions and their compile-time types.
It’s not easy to imagine a scenario where using the API directly could have a benefit over just writing arg1 + arg2 + arg3
, etc.
The makeConcatWithConstants
bootstrap method allows to specify constant parts in addition to the potentially changing parameters. For example, when we have the code
String time = switch(LocalTime.now().get(ChronoField.HOUR_OF_DAY) / 6) {
case 0 -> "night"; case 1 -> "morning"; case 2 -> "afternoon";
case 3 -> "evening"; default -> throw new AssertionError();
};
System.out.println("Hello "+System.getProperty("user.name")+", good "+time+"!");
we have several constant parts which the compiler can merge to a single string, using the placeholder \1
to denote the places where the dynamic values have to be inserted, so the recipe
parameter will be "Hello \1, good \1!"
. The other parameter, constants
, will be unused. Then, the corresponding invokedynamic
instruction only needs to provide the two dynamic values on the operand stack.
To make the equivalent manual invocation more interesting we assume the system property user.name
to be invariant, hence we can provide it as a constant in the bootstrap invocation, use the placeholder \2
to reference it, and produce a handle only consuming one dynamic argument, the time string:
MethodHandle mh = StringConcatFactory.makeConcatWithConstants(
MethodHandles.lookup(), // normally provided by the JVM
"foobar", // normally provided by javac, but meaningless here
// method type is normally provided by the JVM and matches the invocation
MethodType.methodType(String.class, String.class),
"Hello \2, good \1!", // recipe, \1 binds a parameter, \2 a constant
System.getProperty("user.name") // the first constant to bind
).getTarget();
// we can now use the handle to perform a concatenation
// the argument types must match the MethodType specified above
String result = (String)mh.invokeExact(time);
System.out.println(result);
Ordinary Java code will rarely make use of the additional constants
. The only scenario I know of, is the corner case of having \1
or \2
in the original constant strings. To prevent them from being interpreted as placeholders, those substrings will be provided as constants then.
As demonstrated in this online code tester, the code
String time = switch(LocalTime.now().get(ChronoField.HOUR_OF_DAY) / 6) {
case 0 -> "night"; case 1 -> "morning"; case 2 -> "afternoon";
case 3 -> "evening"; default -> throw new AssertionError();
};
System.out.println("Hello "+System.getProperty("user.name")+", good "+time+"!");
String tmp = "prefix \1 "+time+" \2 suffix";
gets compiled to (irrelevant parts omitted):
0: invokestatic #1 // Method java/time/LocalTime.now:()Ljava/time/LocalTime;
3: getstatic #7 // Field java/time/temporal/ChronoField.HOUR_OF_DAY:Ljava/time/temporal/ChronoField;
6: invokevirtual #13 // Method java/time/LocalTime.get:(Ljava/time/temporal/TemporalField;)I
9: bipush 6
11: idiv
12: tableswitch { // 0 to 3
0: 44
1: 49
2: 54
3: 59
default: 64
}
44: ldc #17 // String night
46: goto 72
49: ldc #19 // String morning
51: goto 72
54: ldc #21 // String afternoon
56: goto 72
59: ldc #23 // String evening
61: goto 72
64: new #25 // class java/lang/AssertionError
67: dup
68: invokespecial #27 // Method java/lang/AssertionError."<init>":()V
71: athrow
72: astore_1
73: getstatic #31 // Field java/lang/System.out:Ljava/io/PrintStream;
76: ldc #37 // String user.name
78: invokestatic #39 // Method java/lang/System.getProperty:(Ljava/lang/String;)Ljava/lang/String;
81: aload_1
82: invokedynamic #43, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
87: invokevirtual #47 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
90: aload_1
91: invokedynamic #53, 0 // InvokeDynamic #1:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;
96: astore_2
BootstrapMethods:
0: #150 REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
Method arguments:
#151 Hello \u0001, good \u0001!
1: #150 REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
Method arguments:
#153 \u0002\u0001\u0002
#155 prefix \u0001
#157 \u0002 suffix
String concatenation: concat() vs + operator
No, not quite.
Firstly, there's a slight difference in semantics. If a
is null
, then a.concat(b)
throws a NullPointerException
but a+=b
will treat the original value of a
as if it were null
. Furthermore, the concat()
method only accepts String
values while the +
operator will silently convert the argument to a String (using the toString()
method for objects). So the concat()
method is more strict in what it accepts.
To look under the hood, write a simple class with a += b;
public class Concat {
String cat(String a, String b) {
a += b;
return a;
}
}
Now disassemble with javap -c
(included in the Sun JDK). You should see a listing including:
java.lang.String cat(java.lang.String, java.lang.String);
Code:
0: new #2; //class java/lang/StringBuilder
3: dup
4: invokespecial #3; //Method java/lang/StringBuilder."<init>":()V
7: aload_1
8: invokevirtual #4; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
11: aload_2
12: invokevirtual #4; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
15: invokevirtual #5; //Method java/lang/StringBuilder.toString:()Ljava/lang/ String;
18: astore_1
19: aload_1
20: areturn
So, a += b
is the equivalent of
a = new StringBuilder()
.append(a)
.append(b)
.toString();
The concat
method should be faster. However, with more strings the StringBuilder
method wins, at least in terms of performance.
The source code of String
and StringBuilder
(and its package-private base class) is available in src.zip of the Sun JDK. You can see that you are building up a char array (resizing as necessary) and then throwing it away when you create the final String
. In practice memory allocation is surprisingly fast.
Update: As Pawel Adamski notes, performance has changed in more recent HotSpot. javac
still produces exactly the same code, but the bytecode compiler cheats. Simple testing entirely fails because the entire body of code is thrown away. Summing System.identityHashCode
(not String.hashCode
) shows the StringBuffer
code has a slight advantage. Subject to change when the next update is released, or if you use a different JVM. From @lukaseder, a list of HotSpot JVM intrinsics.
How is + implemented in Java?
This is a test I just made:
I created a class with those 3 instructions:
String s1 = "foo";
String s2 = "bar";
String s3 = s1 + s2;
Then I took the generated .class file and I decompiled using JAD decompiler.
This is how the code show up in the regenerated source:
String s = "foo";
String s1 = "bar";
String s2 = (new StringBuilder()).append(s).append(s1).toString();
So: this is the difference between + and concat.
I guess concat() is always better than StringBuilder, because it requires less objects to be created. You may chose StringBuilder if you want to append string repeatedly in a loop; in this case concat may create a new String each time, while StringBuilder may just expand the internal buffer. But, if StringBuilder is best in this last scenario, we can say that still concat() is better than +, in loops.
Related Topics
What Does '->' (Arrow) Mean in Gradle's Dependency Graph
Why Do We Have to Call Super in Android Sometimes
Values of Counter Changes After Scrolling Expendablelistview
Sha256Withrsa What Does It Do and in What Order
Encrypt with Node.Js Crypto Module and Decrypt with Java (In Android App)
Getlocationonscreen() VS Getlocationinwindow()
Android: Why Setvisibility(View.Gone); or Setvisibility(View.Invisible); Do Not Work
How to Avoid Unnecessary Firestore Reads with Cache
Issue: Passing Large Data to Second Activity
How Get Value from Linkedhashmap Based on Index Not on Key
How Many Ways to Convert Bitmap to String and Vice-Versa
Why Does My Compare Method Throw Exception -- Comparison Method Violates Its General Contract!
Trivial: Get Confirmation of Email Sent in Android
Is There an Advantage to Running Jruby If You Don't Know Any Java