Java: String Concat VS Stringbuilder - Optimised, So What Should I Do

Java: String concat vs StringBuilder - optimised, so what should I do?

I think the use of StringBuilder vs + really depends on the context you are using it in.

Generally using JDK 1.6 and above the compiler will automatically join strings together using StringBuilder.

String one = "abc";
String two = "xyz";
String three = one + two;

This will compile String three as:

String three = new StringBuilder().append(one).append(two).toString();

This is quite helpful and saves us some runtime. However this process is not always optimal. Take for example:

String out = "";
for( int i = 0; i < 10000 ; i++ ) {
out = out + i;
}
return out;

If we compile to bytecode and then decompile the bytecode generated we get something like:

String out = "";
for( int i = 0; i < 10000; i++ ) {
out = new StringBuilder().append(out).append(i).toString();
}
return out;

The compiler has optimised the inner loop but certainly has not made the best possible optimisations. To improve our code we could use:

StringBuilder out = new StringBuilder();
for( int i = 0 ; i < 10000; i++ ) {
out.append(i);
}
return out.toString();

Now this is more optimal than the compiler generated code, so there is definitely a need to write code using the StringBuilder/StringBuffer classes in cases where efficient code is needed. The current compilers are not great at dealing concatenating strings in loops, however this could change in the future.

You need to look carefully to see where you need to manually apply StringBuilder and try to use it where it will not reduce readability of your code too.

Note: I compiled code using JDK 1.6, and and decompiled the code using the javap program, which spits out byte code. It is fairly easy to interpret and is often a useful reference to look at when trying to optimise code. The compiler does change you code behind the scenes so it is always interesting to see what it does!

StringBuilder vs String concatenation in toString() in Java

Version 1 is preferable because it is shorter and the compiler will in fact turn it into version 2 - no performance difference whatsoever.

More importantly given we have only 3
properties it might not make a
difference, but at what point do you
switch from concat to builder?

At the point where you're concatenating in a loop - that's usually when the compiler can't substitute StringBuilder by itself.

Best practices/performance: mixing StringBuilder.append with String.concat

+ operator

String s = s1 + s2

Behind the scenes this is translated to:

String s = new StringBuilder(s1).append(s2).toString();

Imagine how much extra work it adds if you have s1 + s2 here:

stringBuilder.append(s1 + s2)

instead of:

stringBuilder.append(s1).append(s2)

Multiple strings with +

Worth to note that:

String s = s1 + s2 + s3 + ... +sN

is translated to:

String s = new StringBuilder(s1).append(s2).append(s3)...apend(sN).toString();

concat()

String s = s1.concat(s2);

String creates char[] array that can fit both s1 and s2. Copies s1 and s2 contents to this new array. Actually requires less work then + operator.

StringBuilder.append()

Maintains an internal char[] array that grows when needed. No extra char[] is created if the internal one is sufficiently big.

stringBuilder.append(s1.concat(s2))

is also performing poorly because s1.concat(s2) creates an extra char[] array and copies s1 and s2 to it just to copy that new array contents to internal StringBuilder char[].

That being said you should use append() all the time and append raw strings (your first code snippet is correct).

Should I rewrite string concat using stringBuilder if I have enough memory?

Do nothing. The compiler will make use of a StringBuilder for you behind the scenes.

How much does Java optimize string concatenation with +?

As far as I know, there is no compiler generating code reusing StringBuilder instances, most notably javac and ECJ don’t generate reusing code.

It’s important to emphasize that it is reasonable not to do such re-use. It’s not safe to assume that code retrieving an instance from a ThreadLocal variable is faster than a plain allocation from a TLAB. Even by trying to add the potential costs of a local gc cycle for reclaiming that instance, as far as we can identify its fraction on the costs, we can’t conclude that.

So the code trying to reuse the builder would be more complicated, wasting memory, as it keeps the builder alive without knowing whether it ever will be actually reused, without a clear performance benefit.

Especially when we consider that additionally to the statement above

  • JVMs like HotSpot have Escape Analysis, which can elide pure local allocations like these altogether and also may elide the copying costs of array resize operations
  • Such sophisticated JVMs usually also have optimizations dedicated specifically to StringBuilder based concatenation, which work best when the compiled code follows the common pattern

With Java 9, the picture is going to change again. Then, string concatenation will get compiled to an invokedynamic instruction which will get linked to a JRE provided factory at runtime (see StringConcatFactory). Then, the JRE will decide how the code will look like, which allows to tailor it to the specific JVM, including buffer re-use, if it has a benefit on that particular JVM. This will also reduce the code size, as it requires only a single instruction rather than the sequence of an allocation and multiple calls into the StringBuilder.

When to use StringBuilder in Java

If you use String concatenation in a loop, something like this,

String s = "";
for (int i = 0; i < 100; i++) {
s += ", " + i;
}

then you should use a StringBuilder (not StringBuffer) instead of a String, because it is much faster and consumes less memory.

If you have a single statement,

String s = "1, " + "2, " + "3, " + "4, " ...;

then you can use Strings, because the compiler will use StringBuilder automatically.

String concatenation: concat() vs + operator

No, not quite.

Firstly, there's a slight difference in semantics. If a is null, then a.concat(b) throws a NullPointerException but a+=b will treat the original value of a as if it were null. Furthermore, the concat() method only accepts String values while the + operator will silently convert the argument to a String (using the toString() method for objects). So the concat() method is more strict in what it accepts.

To look under the hood, write a simple class with a += b;

public class Concat {
String cat(String a, String b) {
a += b;
return a;
}
}

Now disassemble with javap -c (included in the Sun JDK). You should see a listing including:

java.lang.String cat(java.lang.String, java.lang.String);
Code:
0: new #2; //class java/lang/StringBuilder
3: dup
4: invokespecial #3; //Method java/lang/StringBuilder."<init>":()V
7: aload_1
8: invokevirtual #4; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
11: aload_2
12: invokevirtual #4; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
15: invokevirtual #5; //Method java/lang/StringBuilder.toString:()Ljava/lang/ String;
18: astore_1
19: aload_1
20: areturn

So, a += b is the equivalent of

a = new StringBuilder()
.append(a)
.append(b)
.toString();

The concat method should be faster. However, with more strings the StringBuilder method wins, at least in terms of performance.

The source code of String and StringBuilder (and its package-private base class) is available in src.zip of the Sun JDK. You can see that you are building up a char array (resizing as necessary) and then throwing it away when you create the final String. In practice memory allocation is surprisingly fast.

Update: As Pawel Adamski notes, performance has changed in more recent HotSpot. javac still produces exactly the same code, but the bytecode compiler cheats. Simple testing entirely fails because the entire body of code is thrown away. Summing System.identityHashCode (not String.hashCode) shows the StringBuffer code has a slight advantage. Subject to change when the next update is released, or if you use a different JVM. From @lukaseder, a list of HotSpot JVM intrinsics.

String concatenation vs String Builder. Performance

The usual answer is that string concatenation is more efficient for between 4 to 8 strings. It depends on whose blog you read.

Don't write a test to decide on which method to use. If you are unsure of whether it will go over the magic limit, then just use StringBuilder.

Run this code to see the results for yourself:

const int sLen=30, Loops=5000;
DateTime sTime, eTime;
int i;
string sSource = new String('X', sLen);
string sDest = "";
//
// Time string concatenation.
//
sTime = DateTime.Now;
for(i=0;i<Loops;i++) sDest += sSource;
eTime = DateTime.Now;
Console.WriteLine("Concatenation took " + (eTime - sTime).TotalSeconds + " seconds.");
//
// Time StringBuilder.
//
sTime = DateTime.Now;
System.Text.StringBuilder sb = new System.Text.StringBuilder((int)(sLen * Loops * 1.1));
for(i=0;i<Loops;i++) sb.Append(sSource);
sDest = sb.ToString();
eTime = DateTime.Now;
Console.WriteLine("String Builder took " + (eTime - sTime).TotalSeconds + " seconds.");
//
// Make the console window stay open
// so that you can see the results when running from the IDE.
//
Console.WriteLine();
Console.Write("Press Enter to finish ... ");
Console.Read();

Ref. http://support.microsoft.com/kb/306822



Related Topics



Leave a reply



Submit