Does C# Optimize the Concatenation of String Literals

Does C# optimize the concatenation of string literals?

Yes. This is guaranteed by the C# specification. It's in section 7.18 (of the C# 3.0 spec):

Whenever an expression fulfills the
requirements listed above, the
expression is evaluated at
compile-time. This is true even if the
expression is a sub-expression of a
larger expression that contains
non-constant constructs.

(The "requirements listed above" include the + operator applied to two constant expressions.)

See also this question.

Does VB.NET optimize the concatenation of string literals?

Yes. It Does. I only tested VS 2008 but I strongly suspect previous versions did as well.

VB.NET

Public Class Class1

Dim s As String = "test " + "this " + "function"

Public Function test() As String
Return s
End Function

End Class

I.L. - Notice the string "test this function"

{
.maxstack 8
L_0000: ldarg.0
L_0001: call instance void [mscorlib]System.Object::.ctor()
L_0006: nop
L_0007: ldarg.0
L_0008: ldstr "test this function"
L_000d: stfld string ClassLibrary1.Class1::s
L_0012: nop
L_0013: ret
}

Which string concatenation operation is faster - + or string.Concat

This comparison is wrong (assuming you are not interested in concatenation of string constants exclusively).

In your first snippet, the concatenation has already been performed by the C# compiler:

.method private hidebysig static void  Main(string[] args) cil managed
{
.entrypoint
// Code size 8 (0x8)
.maxstack 1
.locals init ([0] string s1)
IL_0000: nop
IL_0001: ldstr "12" // The strings are already concatenated in the IL.
IL_0006: stloc.0
IL_0007: ret
}

In your second snippet, the call to string.Concat remains:

.method private hidebysig static void  Main(string[] args) cil managed
{
.entrypoint
// Code size 18 (0x12)
.maxstack 2
.locals init ([0] string s1)
IL_0000: nop
IL_0001: ldstr "1"
IL_0006: ldstr "2"
IL_000b: call string [mscorlib]System.String::Concat(string,
string)
IL_0010: stloc.0
IL_0011: ret
}

Therefore, it's meaningless to try to discern the performance of your two snippets using constants, because you'll get non-representative results.

In the general case, the C# compiler will compile a chain of + operators on strings as a single call to string.Concat. You can verify this by performing pretty much the same test that you did, with variables instead of constants.

As a demonstration, consider these two C# methods. One uses + to concatenate strings:

static string Plus(string a, string b, string c)
{
return a + b + c;
}

The other calls string.Concat:

static string Concat(string a, string b, string c)
{
return string.Concat(a, b, c);
}

Now look at their respective IL, using a Debug configuration:

.method private hidebysig static string Plus (
string a,
string b,
string c
) cil managed
{
.locals init (
[0] string V_0
)

IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldarg.1
IL_0003: ldarg.2
IL_0004: call string [mscorlib]System.String::Concat(string, string, string)
IL_0009: stloc.0
IL_000a: br.s IL_000c

IL_000c: ldloc.0
IL_000d: ret
}

And:

.method private hidebysig static string Concat (
string a,
string b,
string c
) cil managed
{
.locals init (
[0] string V_0
)

IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldarg.1
IL_0003: ldarg.2
IL_0004: call string [mscorlib]System.String::Concat(string, string, string)
IL_0009: stloc.0
IL_000a: br.s IL_000c

IL_000c: ldloc.0
IL_000d: ret
}

They are identical (except for their names). If we build using a Release configuration, we get shorter IL - but still identical for both methods.

In conclusion, in this special case, we can safely assume that we won't observe any performance difference between the two ways of expressing the same thing.

In the general case (where the IL isn't identical or near identical), we can't make any assumptions about performance based on our mental model of the CLR. Even if we do have a completely accurate mental model of the CLR, we must take under consideration that the bytecode ultimately gets compiled machine code, which is different from IL (for example, x86 code has registers but IL doesn't).

To reason about performance we use profilers instead, as they can give us practical, detailed metrics.

Difference in String concatenation performance

In the case you present, it's actually better to use the concatenation operator on the string class. This is because it can pre-compute the lengths of the strings and allocate the buffer once and do a fast copy of the memory into the new string buffer.

And this is the general rule for concatenating strings. When you have a set number of items that you want to concatenate together (be it 2, or 2000, etc) it's better to just concatenate them all with the concatenation operator like so:

string result = s1 + s2 + ... + sn;

It should be noted in your specific case for s1:

string s1 = "foo" + "bar";

The compiler sees that it can optimize the concatenation of string literals here and transforms the above into this:

string s1 = "foobar";

Note, this is only for the concatenation of two string literals together. So if you were to do this:

string s2 = foo + "a" + bar;

Then it does nothing special (but it still makes a call to Concat and precomputes the length). However, in this case:

string s2 = foo + "a" + "nother" + bar;

The compiler will translate that into:

string s2 = foo + "another" + bar;

If the number of strings that you are concatenating is variable (as in, a loop which you don't know beforehand how many elements there are in it), then the StringBuilder is the most efficient way of concatenating those strings, as you will always have to reallcate the buffer to account for the new string entries being added (of which you don't know how many are left).

Does concatenation of constants get optimized to the concatenated form at compile time in C#?

Yes, the concatenated string is interned see: .NET-Fiddle

public class Program
{
public static void Main()
{
string abcdef = MyConstants.ABC + MyConstants.DEF;
if(string.IsInterned(abcdef) != null)
{
Console.Write("abcdef is interned");
}
}
}

public static class MyConstants {
public const string ABC = "ABC";
public const string DEF = "DEF";
}

I think Jon answers it also here: https://stackoverflow.com/a/288802/284240

C# Compile-Time Concatenation For String Constants

Yes, it does. You can verify this using by using ildasm or Reflector to inspect the code.

static void Main(string[] args) {
string s = "A" + "B";
Console.WriteLine(s);
}

is translated to

.method private hidebysig static void  Main(string[] args) cil managed {
.entrypoint
// Code size 17 (0x11)
.maxstack 1
.locals init ([0] string s)
IL_0000: nop
IL_0001: ldstr "AB" // note that "A" + "B" is concatenated to "AB"
IL_0006: stloc.0
IL_0007: ldloc.0
IL_0008: call void [mscorlib]System.Console::WriteLine(string)
IL_000d: nop
IL_000e: br.s IL_0010
IL_0010: ret
} // end of method Program::Main

There is something even more interesting but related that happens. If you have a string literal in an assembly, the CLR will only create one object for all instances of that same literal in the assembly.

Thus:

static void Main(string[] args) {
string s = "A" + "B";
string t = "A" + "B";
Console.WriteLine(Object.ReferenceEquals(s, t)); // prints true!
}

will print "True" on the console! This optimization is called string interning.

Most efficient way to concatenate strings?

The StringBuilder.Append() method is much better than using the + operator. But I've found that, when executing 1000 concatenations or less, String.Join() is even more efficient than StringBuilder.

StringBuilder sb = new StringBuilder();
sb.Append(someString);

The only problem with String.Join is that you have to concatenate the strings with a common delimiter.

Edit: as @ryanversaw pointed out, you can make the delimiter string.Empty.

string key = String.Join("_", new String[] 
{ "Customers_Contacts", customerID, database, SessionID });

Do string literals get optimised by the compiler?

EDIT: While I strongly suspect the statement below is true for all C# compiler implementations, I'm not sure it's actually guaranteed in the spec. Section 2.4.4.5 of the spec talks about literals referring to the same string instance, but it doesn't mention other constant string expressions. I suspect this is an oversight in the spec - I'll email Mads and Eric about it.


It's not just string literals. It's any string constant. So for example, consider:

public const string X = "X";
public const string Y = "Y";
public const string XY = "XY";

void Foo()
{
string z = X + Y;
}

The compiler realises that the concatenation here (for z) is between two constant strings, and so the result is also a constant string. Therefore the initial value of z will be the same reference as the value of XY, because they're compile-time constants with the same value.

EDIT: The reply from Mads and Eric suggested that in the Microsoft C# compiler string constants and string literals are usually treated the same way - but that other implementations may differ.

string concatenation and reference equality

When you concatenate with the identity (the empty string) the reference maintains intact. Is this a compile time optimization or is the overloaded assignment operator making the decision to not realloc at runtime?

It is both a compile time optimization and also an optimization performed in the implementation of the overloaded concatenation operator. If you concat two compile time literals, or concat a string known to be null or empty at compile time, the concatenation is done at compile time, and then potentially interned, and will therefore be reference equal to any other compile time literal string that has the same value.

Additionally, String.Concat is implemented such that if you concat a string with either null or an empty string, it just returns the other string (unless the other string was null, in which case it returns an empty string). The test you already have demonstrates this, as you're concatting a non-compile time literal string with an empty string and it's staying reference equal.

Of course if you don't believe your own test, you can look at the source to see that if one of the arguments is null then it simply returns the other.

if (IsNullOrEmpty(str0)) {
if (IsNullOrEmpty(str1)) {
return String.Empty;
}
return str1;
}

if (IsNullOrEmpty(str1)) {
return str0;
}


Related Topics



Leave a reply



Submit