How Is String Concatenation Implemented in Java 9

How is String concatenation implemented in Java 9?

The "old" way output a bunch of StringBuilder-oriented operations. Consider this program:

public class Example {
public static void main(String[] args)
{
String result = args[0] + "-" + args[1] + "-" + args[2];
System.out.println(result);
}
}

If we compile that with JDK 8 or earlier and then use javap -c Example to see the bytecode, we see something like this:


public class Example {
public Example();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return

public static void main(java.lang.String[]);
Code:
0: new #2 // class java/lang/StringBuilder
3: dup
4: invokespecial #3 // Method java/lang/StringBuilder."<init>":()V
7: aload_0
8: iconst_0
9: aaload
10: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
13: ldc #5 // String -
15: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
18: aload_0
19: iconst_1
20: aaload
21: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
24: ldc #5 // String -
26: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
29: aload_0
30: iconst_2
31: aaload
32: invokevirtual #4 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
35: invokevirtual #6 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
38: astore_1
39: getstatic #7 // Field java/lang/System.out:Ljava/io/PrintStream;
42: aload_1
43: invokevirtual #8 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
46: return
}

As you can see, it creates a StringBuilder and uses append. This is famous fairly inefficient as the default capacity of the built-in buffer in StringBuilder is only 16 chars, and there's no way for the compiler to know to allocate more in advance, so it ends up having to reallocate. It's also a bunch of method calls. (Note that the JVM can sometimes detect and rewrite these patterns of calls to make them more efficient, though.)

Let's look at what Java 9 generates:


public class Example {
public Example();
Code:
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: return

public static void main(java.lang.String[]);
Code:
0: aload_0
1: iconst_0
2: aaload
3: aload_0
4: iconst_1
5: aaload
6: aload_0
7: iconst_2
8: aaload
9: invokedynamic #2, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
14: astore_1
15: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;
18: aload_1
19: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
22: return
}

Oh my but that's shorter. :-) It makes a single call to makeConcatWithConstants from StringConcatFactory, which says this in its Javadoc:

Methods to facilitate the creation of String concatenation methods, that can be used to efficiently concatenate a known number of arguments of known types, possibly after type adaptation and partial evaluation of arguments. These methods are typically used as bootstrap methods for invokedynamic call sites, to support the string concatenation feature of the Java Programming Language.

When to use String#concat() method in Java?

String#concat and + exist to provide a minimalistic set of operations on the type String.

They are not efficient if used multiple times.

But they have their own right as type operations "xxx" + "yyy" you do not want to specify using a StringBuilder. (Furthermore there it is a compile time concatenation.)

StringBuffer is a mistake IMHO. It is slower that the newer StringBuilder as it is synchronized, but one would rarely add something rom two threads (unordered).

String::concat may be a method reference useful for stream reduction or such.

Can't understand how Java string literals are implemented

15.18.1. String Concatenation Operator +


An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.

In your case,

String s1 = "hello";
String s2 ="bc";
int value = 22;

String r = s1 + s2 + value;

you will get

INVOKESPECIAL java/lang/StringBuilder.<init> ()V
ALOAD 1
INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
ALOAD 2
INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
ILOAD 3
INVOKEVIRTUAL java/lang/StringBuilder.append (I)Ljava/lang/StringBuilder;
INVOKEVIRTUAL java/lang/StringBuilder.toString ()Ljava/lang/String;

While concatinating constant objects

String r = "hello" + "bc" + 22;

you will get

LDC "hellobc22"
ASTORE 2

When the compiler "meets" this code:

String s = s1+s2+22;

it "changes" it to:

String s = new StringBuilder().append("hello").append("bc").append(22).toString();

No. It may optimise it to

String s = new StringBuilder().append(s1).append(s2).append(value).toString();

but it can't replace s1 with "hello" because there is no guarantee the variable will keep referring to "hello". The variable is not final and thus open to reassignment.

String concatenation with the + symbol

The rule

“do not concatenate Strings with + !!!“

is wrong, because it is incomplete and therefore misleading.

The rule is

do not concatenate Strings with + in a loop

and that rule still holds. The original rule was never meant to be applied outside of loops!

A simple loop

String s = "";
for (int i = 0; i < 10000; i++) { s += i; }
System.out.println(s);

is still much still much slower than

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 10000; i++) { sb.append(i); }
System.out.println(sb.toString());

because the Java compiler has to translate the first loop into

String s = "";
for (int i = 0; i < 1000; i++) { s = new StringBuilder(s).append(i).toString(); }
System.out.println(s);

Also the claim

Today the JVM compiles the + symbol into a string builder (in most cases).

is misleading at least, because this translation was already done with Java 1.0 (ok, not with StringBuilder but with StringBuffer, because StringBuilder was only added with Java5).


One could also argue that the claim

Today the JVM compiles the + symbol into a string builder (in most cases).

is simply wrong, because the compilation is not done by the JVM. It is done by the Java Compiler.


For the question: when does the Java compiler use StringBuilder.append() and when does it use some other mechanism?

The source code of the Java compiler (version 1.8) contains two places where String concationation through the + operator is handled.

  • the first place is String constant folding (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/com/sun/tools/javac/comp/ConstFold.java?av=f#314). In this case the compiler can calculate the resulting string and works with the resulting string.
  • the second place is where the compiler creates the code for assignment operations (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/com/sun/tools/javac/jvm/Gen.java?av=f#2056). In this case the compiler always emits code to create a StringBuilder

The conclusion is that for the Java compiler from the OpenJDK (which means the compiler distributed by Oracle) the phrase in most cases means always. (Though this could change with Java 9, or it could be that another Java compiler like the one that is included within Eclipse uses some other mechanism).

How can the methods `makeConcat​` and `makeConcatWithConstants` in `StringConcatFactory` used by directly calling the API?

You are not supposed to call this API directly. The class has been designed to provide bootstrap methods for an invokedynamic instruction, so its API is straight-forward for that use case, but not for direct invocations.

But the documentation is exhaustive:

Parameters

  • lookup - Represents a lookup context with the accessibility privileges of the caller. When used with invokedynamic, this is stacked automatically by the VM.
  • name - The name of the method to implement. This name is arbitrary, and has no meaning for this linkage method. When used with invokedynamic, this is provided by the NameAndType of the InvokeDynamic structure and is stacked automatically by the VM.
  • concatType - The expected signature of the CallSite. The parameter types represent the types of concatenation arguments; the return type is always assignable from String. When used with invokedynamic, this is provided by the NameAndType of the InvokeDynamic structure and is stacked automatically by the VM.

Emphasis added by me

Note how all parameters are normally provided by the JVM automatically on the basis of the invokedynamic bytecode instruction. In this context, it’s a single instruction consuming some arguments and producing a String, referring to this bootstrap method as the entity knowing how to do the operation.

When you want to invoke it manually, for whatever reasons, you’d have to do something like

String arg1 = "Hello";
char arg2 = ' ';
String arg3 = "StringConcatFactory";

MethodHandle mh = StringConcatFactory.makeConcat(
MethodHandles.lookup(), // normally provided by the JVM
"foobar", // normally provided by javac, but meaningless here
// method type is normally provided by the JVM and matches the invocation
MethodType.methodType(String.class, String.class, char.class, String.class))
.getTarget();

// we can now use the handle to perform a concatenation
// the argument types must match the MethodType specified above
String result = (String)mh.invokeExact(arg1, arg2, arg3);

System.out.println(result);

You could re-use the MethodHandle for multiple string concatenations, but your are bound to the parameter types you’ve specified during the bootstrapping.

For ordinary string concatenation expressions, each expression gets linked during its bootstrapping to a handle matching the fixed number of subexpressions and their compile-time types.

It’s not easy to imagine a scenario where using the API directly could have a benefit over just writing arg1 + arg2 + arg3, etc.


The makeConcatWithConstants bootstrap method allows to specify constant parts in addition to the potentially changing parameters. For example, when we have the code

String time = switch(LocalTime.now().get(ChronoField.HOUR_OF_DAY) / 6) {
case 0 -> "night"; case 1 -> "morning"; case 2 -> "afternoon";
case 3 -> "evening"; default -> throw new AssertionError();
};

System.out.println("Hello "+System.getProperty("user.name")+", good "+time+"!");

we have several constant parts which the compiler can merge to a single string, using the placeholder \1 to denote the places where the dynamic values have to be inserted, so the recipe parameter will be "Hello \1, good \1!". The other parameter, constants, will be unused. Then, the corresponding invokedynamic instruction only needs to provide the two dynamic values on the operand stack.

To make the equivalent manual invocation more interesting we assume the system property user.name to be invariant, hence we can provide it as a constant in the bootstrap invocation, use the placeholder \2 to reference it, and produce a handle only consuming one dynamic argument, the time string:

MethodHandle mh = StringConcatFactory.makeConcatWithConstants(
MethodHandles.lookup(), // normally provided by the JVM
"foobar", // normally provided by javac, but meaningless here
// method type is normally provided by the JVM and matches the invocation
MethodType.methodType(String.class, String.class),
"Hello \2, good \1!", // recipe, \1 binds a parameter, \2 a constant
System.getProperty("user.name") // the first constant to bind
).getTarget();

// we can now use the handle to perform a concatenation
// the argument types must match the MethodType specified above
String result = (String)mh.invokeExact(time);

System.out.println(result);

Ordinary Java code will rarely make use of the additional constants. The only scenario I know of, is the corner case of having \1 or \2 in the original constant strings. To prevent them from being interpreted as placeholders, those substrings will be provided as constants then.

As demonstrated in this online code tester, the code

String time = switch(LocalTime.now().get(ChronoField.HOUR_OF_DAY) / 6) {
case 0 -> "night"; case 1 -> "morning"; case 2 -> "afternoon";
case 3 -> "evening"; default -> throw new AssertionError();
};

System.out.println("Hello "+System.getProperty("user.name")+", good "+time+"!");

String tmp = "prefix \1 "+time+" \2 suffix";

gets compiled to (irrelevant parts omitted):

 0: invokestatic  #1                  // Method java/time/LocalTime.now:()Ljava/time/LocalTime;
3: getstatic #7 // Field java/time/temporal/ChronoField.HOUR_OF_DAY:Ljava/time/temporal/ChronoField;
6: invokevirtual #13 // Method java/time/LocalTime.get:(Ljava/time/temporal/TemporalField;)I
9: bipush 6
11: idiv
12: tableswitch { // 0 to 3
0: 44
1: 49
2: 54
3: 59
default: 64
}
44: ldc #17 // String night
46: goto 72
49: ldc #19 // String morning
51: goto 72
54: ldc #21 // String afternoon
56: goto 72
59: ldc #23 // String evening
61: goto 72
64: new #25 // class java/lang/AssertionError
67: dup
68: invokespecial #27 // Method java/lang/AssertionError."<init>":()V
71: athrow
72: astore_1
73: getstatic #31 // Field java/lang/System.out:Ljava/io/PrintStream;
76: ldc #37 // String user.name
78: invokestatic #39 // Method java/lang/System.getProperty:(Ljava/lang/String;)Ljava/lang/String;
81: aload_1
82: invokedynamic #43, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
87: invokevirtual #47 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
90: aload_1
91: invokedynamic #53, 0 // InvokeDynamic #1:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;
96: astore_2
BootstrapMethods:
0: #150 REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
Method arguments:
#151 Hello \u0001, good \u0001!
1: #150 REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
Method arguments:
#153 \u0002\u0001\u0002
#155 prefix \u0001
#157 \u0002 suffix

String concatenation: concat() vs + operator

No, not quite.

Firstly, there's a slight difference in semantics. If a is null, then a.concat(b) throws a NullPointerException but a+=b will treat the original value of a as if it were null. Furthermore, the concat() method only accepts String values while the + operator will silently convert the argument to a String (using the toString() method for objects). So the concat() method is more strict in what it accepts.

To look under the hood, write a simple class with a += b;

public class Concat {
String cat(String a, String b) {
a += b;
return a;
}
}

Now disassemble with javap -c (included in the Sun JDK). You should see a listing including:

java.lang.String cat(java.lang.String, java.lang.String);
Code:
0: new #2; //class java/lang/StringBuilder
3: dup
4: invokespecial #3; //Method java/lang/StringBuilder."<init>":()V
7: aload_1
8: invokevirtual #4; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
11: aload_2
12: invokevirtual #4; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
15: invokevirtual #5; //Method java/lang/StringBuilder.toString:()Ljava/lang/ String;
18: astore_1
19: aload_1
20: areturn

So, a += b is the equivalent of

a = new StringBuilder()
.append(a)
.append(b)
.toString();

The concat method should be faster. However, with more strings the StringBuilder method wins, at least in terms of performance.

The source code of String and StringBuilder (and its package-private base class) is available in src.zip of the Sun JDK. You can see that you are building up a char array (resizing as necessary) and then throwing it away when you create the final String. In practice memory allocation is surprisingly fast.

Update: As Pawel Adamski notes, performance has changed in more recent HotSpot. javac still produces exactly the same code, but the bytecode compiler cheats. Simple testing entirely fails because the entire body of code is thrown away. Summing System.identityHashCode (not String.hashCode) shows the StringBuffer code has a slight advantage. Subject to change when the next update is released, or if you use a different JVM. From @lukaseder, a list of HotSpot JVM intrinsics.

How is + implemented in Java?

This is a test I just made:

I created a class with those 3 instructions:

    String s1 = "foo";
String s2 = "bar";
String s3 = s1 + s2;

Then I took the generated .class file and I decompiled using JAD decompiler.
This is how the code show up in the regenerated source:

    String s = "foo";
String s1 = "bar";
String s2 = (new StringBuilder()).append(s).append(s1).toString();

So: this is the difference between + and concat.

I guess concat() is always better than StringBuilder, because it requires less objects to be created. You may chose StringBuilder if you want to append string repeatedly in a loop; in this case concat may create a new String each time, while StringBuilder may just expand the internal buffer. But, if StringBuilder is best in this last scenario, we can say that still concat() is better than +, in loops.



Related Topics



Leave a reply



Submit