How Does Stringbuilder Work Internally in C#

How does StringBuilder work internally in C#?

When you use the + operator to build up a string:

string s = "01";
s += "02";
s += "03";
s += "04";

then on the first concatenation we make a new string of length four and copy "01" and "02" into it -- four characters are copied. On the second concatenation we make a new string of length six and copy "0102" and "03" into it -- six characters are copied. On the third concat, we make a string of length eight and copy "010203" and "04" into it -- eight characters are copied. So far a total of 4 + 6 + 8 = 18 characters have been copied for this eight-character string. Keep going.

...
s += "99";

On the 98th concat we make a string of length 198 and copy "010203...98" and "99" into it. That gives us a total of 4 + 6 + 8 + ... + 198 = a lot, in order to make this 198 character string.

A string builder doesn't do all that copying. Rather, it maintains a mutable array that is hoped to be larger than the final string, and stuffs new things into the array as necessary.

What happens when the guess is wrong and the array gets full? There are two strategies. In the previous version of the framework, the string builder reallocated and copied the array when it got full, and doubled its size. In the new implementation, the string builder maintains a linked list of relatively small arrays, and appends a new array onto the end of the list when the old one gets full.

Also, as you have conjectured, the string builder can do tricks with "unsafe" code to improve its performance. For example, the code which writes the new data into the array can already have checked that the array write is going to be within bounds. By turning off the safety system it can avoid the per-write check that the jitter might otherwise insert to verify that every write to the array is safe. The string builder does a number of these sorts of tricks to do things like ensuring that buffers are reused rather than reallocated, ensuring that unnecessary safety checks are avoided, and so on. I recommend against these sorts of shenanigans unless you are really good at writing unsafe code correctly, and really do need to eke out every last bit of performance.

How the StringBuilder class is implemented? Does it internally create new string objects each time we append?

In .NET 2.0 it uses the String class internally. String is only immutable outside of the System namespace, so StringBuilder can do that.

In .NET 4.0 String was changed to use char[].

In 2.0 StringBuilder looked like this

public sealed class StringBuilder : ISerializable
{
// Fields
private const string CapacityField = "Capacity";
internal const int DefaultCapacity = 0x10;
internal IntPtr m_currentThread;
internal int m_MaxCapacity;
internal volatile string m_StringValue; // HERE ----------------------
private const string MaxCapacityField = "m_MaxCapacity";
private const string StringValueField = "m_StringValue";
private const string ThreadIDField = "m_currentThread";

But in 4.0 it looks like this:

public sealed class StringBuilder : ISerializable
{
// Fields
private const string CapacityField = "Capacity";
internal const int DefaultCapacity = 0x10;
internal char[] m_ChunkChars; // HERE --------------------------------
internal int m_ChunkLength;
internal int m_ChunkOffset;
internal StringBuilder m_ChunkPrevious;
internal int m_MaxCapacity;
private const string MaxCapacityField = "m_MaxCapacity";
internal const int MaxChunkSize = 0x1f40;
private const string StringValueField = "m_StringValue";
private const string ThreadIDField = "m_currentThread";

So evidently it was changed from using a string to using a char[].

EDIT: Updated answer to reflect changes in .NET 4 (that I only just discovered).

C# How StringBuilder is mutable?

Mutable doesn't mean that it can't create new stuff. Mutable just means that its state can change after the constructor returns.

For example, this is mutable, even though string is immutable:

class Foo {
public string Bar { get; set; }

public void FooMethod() {
Bar = new string('!', 10);
}
}

Because we can change the state of it by setting Bar or calling FooMethod:

 someFoo.FooMethod();

Yes, I am creating a new string here in the FooMethod, but that does not matter. What does matter is that Bar now has a new value! The state of someFoo changed.

We say StringBuilder is mutable because its state can change, without creating a new StringBuilder. As you have looked up, StringBuilder stores a char array. Each time you append something, that char array changes to something else, but no new StringBuilders are created. This is solid proof that StringBuilder is mutable.

How to use StringBuilder wisely?

Modifying immutable structures like strings must be done by copying the structure, and by that, consuming more memory and slowing the application's run time (also increasing GC time, etc...).

StringBuilder comes to solve this problem by using the same mutable object for manipulations.

However:

when concatenating a string in compile time as the following:

string myString = "123";
myString += "234";
myString += "345";

it will actually compile to something like that:

string myString = string.Concat("123", "234", "345");

this function is faster than working with StringBuilder for the number of strings entering the function is known.

so for compile-time-known string concatenations you should prefer string.Concat().

as for unknown number of string like in the following case:

string myString = "123";
if (Console.ReadLine() == "a")
{
myString += "234";
}
myString += "345";

Now the compiler can't use the string.Concat() function, however, StringBuilder appears to be more efficient in time and memory consumption only when the concatenation is done with 6-7 or more strings.

Bad practice usage:

StringBuilder myString = new StringBuilder("123");
myString.Append("234");
myString.Append("345");

Fine practice usage (note that if is used):

StringBuilder myString = new StringBuilder("123");
if (Console.ReadLine() == "a")
{
myString.Append("234");
}
myString.Append("345");

Best practice usage (note that while loop is used):

StringBuilder myString = new StringBuilder("123");
while (Console.ReadLine() == "a")
{
myString.Append("234"); //Average loop times 4~ or more
}
myString.Append("345");

Does string concatenation use StringBuilder internally?

No, they are not correct. String concatenation creates a new string whereas StringBuilder uses a variable size buffer to build the string, only creating a string object when ToString() is called.

There are many discussions on string concatenation techniques all over the Internet if you would like to read further on the subject. Most focus on the efficiency of the different methods when used in loops. In that scenario, StringBuilder is faster over string concatenation using string operators for concatenations of 10 or more strings, which should indicate that it must be using a different method than the concatenation.

That said, if you're concatenating constant string values, the string operators will be better because the compiler will factor them away, and if your performing non-looped concatenation, using the operators will be better as they should result in a single call to string.Concat.

How does StringBuilder's capacity change?

Depends what version of .NET you're talking about. Prior to .NET 4, StringBuilder used the standard .NET strategy, doubling the capacity of the internal buffer every time it needs to be enlarged.

StringBuilder was completely rewritten for .NET 4, now using ropes. Extending the allocation is now done by adding another piece of rope of up to 8000 chars. Not quite as efficient as the earlier strategy but avoids trouble with big buffers clogging up the Large Object Heap. Source code is available from the Reference Source if you want to take a closer look.

String vs. StringBuilder

Yes, the performance difference is significant. See the KB article "How to improve string concatenation performance in Visual C#".

I have always tried to code for clarity first, and then optimize for performance later. That's much easier than doing it the other way around! However, having seen the enormous performance difference in my applications between the two, I now think about it a little more carefully.

Luckily, it's relatively straightforward to run performance analysis on your code to see where you're spending the time, and then to modify it to use StringBuilder where needed.

Why is the StringBuilder class sealed?

This is a bit difficult to answer. The upvoted answer has a problem, StringBuilder doesn't have any virtual methods. So there's nothing you could do to break the class or doing anything "extra" unsafe.

I think the likely reason is that the CLR has special knowledge of the class. That's a bit mundane for StringBuilder, compared to other .NET types it is intimate with, the pinvoke marshaller knows what the class looks like. You use it when you need to pass a string reference to unmanaged code, allowing it to write the string content. Necessary because that's not legal for String, it is immutable. The pinvoke marshaller knows how to set the internal members of StringBuilder correctly after the pinvoke call. But wouldn't know how to do that for your derived class. That slicing risk is not exactly worth the benefit of not sealing it. Particularly since it doesn't have virtual methods so you cannot override its behavior at all.

An extension method is otherwise a very reasonable workaround.

At what point does using a StringBuilder become insignificant or an overhead?

The rule that I follow is -

Use a StringBuilder when the number of concatenations is unknown at compile time.

So, in your case each StringBuilder was only appending a few times and then being discarded. That isn't really the same as something like

string s = String.Empty;
for (int i = 0; i < 10000; ++i)
{
s += "A";
}

Where using a StringBuilder would drastically improve performance because you would otherwise be constantly allocating new memory.

Does StringBuilder use more memory than String concatenation?

Short answer: StringBuilder is appropriate in cases where you are concatenating an arbitrary number of strings, which you don't know at compile time.

If you do know what strings you're combining at compile time, StringBuilder is basically pointless as you don't need its dynamic resizing capabilities.

Example 1: You want to combine "cat", "dog", and "mouse". This is exactly 11 characters. You could simply allocate a char[] array of length 11 and fill it with the characters from these strings. This is essentially what string.Concat does.

Example 2: You want to join an unspecified number of user-supplied strings into a single string. Since the amount of data to concatenate is unknown in advance, using a StringBuilder is appropriate in this case.



Related Topics



Leave a reply



Submit