How the Stringbuilder Class Is Implemented? Does It Internally Create New String Objects Each Time We Append

How the StringBuilder class is implemented? Does it internally create new string objects each time we append?

In .NET 2.0 it uses the String class internally. String is only immutable outside of the System namespace, so StringBuilder can do that.

In .NET 4.0 String was changed to use char[].

In 2.0 StringBuilder looked like this

public sealed class StringBuilder : ISerializable
{
// Fields
private const string CapacityField = "Capacity";
internal const int DefaultCapacity = 0x10;
internal IntPtr m_currentThread;
internal int m_MaxCapacity;
internal volatile string m_StringValue; // HERE ----------------------
private const string MaxCapacityField = "m_MaxCapacity";
private const string StringValueField = "m_StringValue";
private const string ThreadIDField = "m_currentThread";

But in 4.0 it looks like this:

public sealed class StringBuilder : ISerializable
{
// Fields
private const string CapacityField = "Capacity";
internal const int DefaultCapacity = 0x10;
internal char[] m_ChunkChars; // HERE --------------------------------
internal int m_ChunkLength;
internal int m_ChunkOffset;
internal StringBuilder m_ChunkPrevious;
internal int m_MaxCapacity;
private const string MaxCapacityField = "m_MaxCapacity";
internal const int MaxChunkSize = 0x1f40;
private const string StringValueField = "m_StringValue";
private const string ThreadIDField = "m_currentThread";

So evidently it was changed from using a string to using a char[].

EDIT: Updated answer to reflect changes in .NET 4 (that I only just discovered).

Does StringBuilder creates a new String in every operation?

StringBuilder will only create a new string when toString() is called on it. Until then, it keeps an char[] array of all the elements added to it.

Any operation you perform, like insert or reverse is performed on that array.

How is StringBuffer implementing append function without creating two objects?

First there is a problem with your question:

String s = "orange";
s.append("apple");

here two objects are created

Correct, two Objects are created, the String "orange" and the String "apple", inside the StringBuffer/StringBuilder no Objects will be created if we don't overflow the buffer. So those lines of code create 2 or 3 objects.

StringBuilder s = new StringBuilder("Orange");
s.append("apple");

Now here only one object is created

I don't know where you get that, here you create one StringBuilder Object, one "Orange" String, one "apple" String, for a total of 3 Objects, or 4 if we overflow the StringBuilder buffer. (I count the array creation as object creation).


I read your question as, how can StringBuilder do the append without creating a new Object (when the buffer is not overflown)?

You should look at StringBuilder, since it's the non thread safe implementation. The code is interesting and easy to read. I've added the inline comments.

As internal structure there is a char array, not a String. It is initially built with length 16 and will be increased every time the capacity is exceeded. If the Strings to append fit within the char array, there is no need to create new Objects.

StringBuilder extends AbstractStringBuilder, where you'll find the following code:

/**
* The value is used for character storage.
*/
char value[];

Since not all the array will be used at a given time, another important variable is the length:

/**  
* The count is the number of characters used.
*/
int count;

There are many overloading of append, but the most interesting one is the following:

public AbstractStringBuilder append(String str) {
if (str == null) str = "null"; //will literally append "null" in case of null
int len = str.length(); //get the string length
if (len == 0) return this; //if it's zero, I'm done
int newCount = count + len; //tentative new length
if (newCount > value.length) //would the new length fit?
expandCapacity(newCount); //oops, no, resize my array
str.getChars(0, len, value, count); //now it will fit, copy the chars
count = newCount; //update the count
return this; //return a reference to myself to allow chaining
}

String.getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin) Copies characters from this string into the destination character array.

So, the append method is quite simple, the only magic left to discover is the expandCapacity, here it is:

void expandCapacity(int minimumCapacity) {
//get the current length add one and double it
int newCapacity = (value.length + 1) * 2;
if (newCapacity < 0) { //if we had an integer overflow
newCapacity = Integer.MAX_VALUE; //just use the max positive integer
} else if (minimumCapacity > newCapacity) { //is it enough?
//if doubling wasn't enough, use the actual length computed
newCapacity = minimumCapacity;
}
//copy the old value in the new array
value = Arrays.copyOf(value, newCapacity);
}

Arrays.copyOf(char[] original, int newLength) Copies the specified array, truncating or padding with null characters (if necessary) so the copy has the specified length.

In our case, padding, since we're expanding the length.

How does StringBuilder work internally in C#?

When you use the + operator to build up a string:

string s = "01";
s += "02";
s += "03";
s += "04";

then on the first concatenation we make a new string of length four and copy "01" and "02" into it -- four characters are copied. On the second concatenation we make a new string of length six and copy "0102" and "03" into it -- six characters are copied. On the third concat, we make a string of length eight and copy "010203" and "04" into it -- eight characters are copied. So far a total of 4 + 6 + 8 = 18 characters have been copied for this eight-character string. Keep going.

...
s += "99";

On the 98th concat we make a string of length 198 and copy "010203...98" and "99" into it. That gives us a total of 4 + 6 + 8 + ... + 198 = a lot, in order to make this 198 character string.

A string builder doesn't do all that copying. Rather, it maintains a mutable array that is hoped to be larger than the final string, and stuffs new things into the array as necessary.

What happens when the guess is wrong and the array gets full? There are two strategies. In the previous version of the framework, the string builder reallocated and copied the array when it got full, and doubled its size. In the new implementation, the string builder maintains a linked list of relatively small arrays, and appends a new array onto the end of the list when the old one gets full.

Also, as you have conjectured, the string builder can do tricks with "unsafe" code to improve its performance. For example, the code which writes the new data into the array can already have checked that the array write is going to be within bounds. By turning off the safety system it can avoid the per-write check that the jitter might otherwise insert to verify that every write to the array is safe. The string builder does a number of these sorts of tricks to do things like ensuring that buffers are reused rather than reallocated, ensuring that unnecessary safety checks are avoided, and so on. I recommend against these sorts of shenanigans unless you are really good at writing unsafe code correctly, and really do need to eke out every last bit of performance.

C# How StringBuilder is mutable?

Mutable doesn't mean that it can't create new stuff. Mutable just means that its state can change after the constructor returns.

For example, this is mutable, even though string is immutable:

class Foo {
public string Bar { get; set; }

public void FooMethod() {
Bar = new string('!', 10);
}
}

Because we can change the state of it by setting Bar or calling FooMethod:

 someFoo.FooMethod();

Yes, I am creating a new string here in the FooMethod, but that does not matter. What does matter is that Bar now has a new value! The state of someFoo changed.

We say StringBuilder is mutable because its state can change, without creating a new StringBuilder. As you have looked up, StringBuilder stores a char array. Each time you append something, that char array changes to something else, but no new StringBuilders are created. This is solid proof that StringBuilder is mutable.

Difference between String and StringBuilder and their internal organization

I dunno. Let's go see:

  • http://www.docjar.com/html/api/java/lang/StringBuilder.java.html
72   public final class StringBuilder
73 extends AbstractStringBuilder
  • http://www.docjar.com/html/api/java/lang/AbstractStringBuilder.java.html
45        * The value is used for character storage.
46 */
47 char[] value;
48
49 /**
50 * The count is the number of characters used.
51 */
52 int count;

===

Is StringBuilder an array of characters too?

Apparently, in this particular implementation.

So, I have a StringBuilder MY_OBJ= "Hello". Now if i try to append characters to the end of MY_OBJ, does it not mean that you are actually creating a new array object and copying all these chars into a new one?

Not necessarily. The array isn't necessarily full (count < value.length), so a new array may not need to be allocated. Ideally, you initialized StringBuilder a capacity so that large enough array was allocated from the start.

StringBuilder sb = new StringBuilder(20);
sb.append("Hello");
...
sb.append(" there");

And another question I have in mind is, how does one mark the end of a StringBuilder? Like in C, we use a "/0"

You don't care - String/StringBuilder will handle it internally.

Why StringBuilder when there is String?

String does not allow appending. Each method you invoke on a String creates a new object and returns it. This is because String is immutable - it cannot change its internal state.

On the other hand StringBuilder is mutable. When you call append(..) it alters the internal char array, rather than creating a new string object.

Thus it is more efficient to have:

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 500; i ++) {
sb.append(i);
}

rather than str += i, which would create 500 new string objects.

Note that in the example I use a loop. As helios notes in the comments, the compiler automatically translates expressions like String d = a + b + c to something like

String d = new StringBuilder(a).append(b).append(c).toString();

Note also that there is StringBuffer in addition to StringBuilder. The difference is that the former has synchronized methods. If you use it as a local variable, use StringBuilder. If it happens that it's possible for it to be accessed by multiple threads, use StringBuffer (that's rarer)

Concatenating ListListstring takes a long time

I'd suggest use StringBuilder instead of concatenations in a loop like that:

StringBuilder builder = new StringBuilder();
foreach (List<string> val in L1)
{
builder.Append(string.Join(",", val) + " // ");
}
string result = builder.ToString();

When concatenating in a loop it needs to copy the string everytime to a new position in memory with the extra allocated memory. StringBuilder prevents that.

You can also refer to:

  • How to use StringBuilder wisely
  • How does StringBuilder work?
  • How the StringBuilder class is implemented? Does it internally create new string objects each time we append?

java: use StringBuilder to insert at the beginning

StringBuilder sb = new StringBuilder();
for(int i=0;i<100;i++){
sb.insert(0, Integer.toString(i));
}

Warning: It defeats the purpose of StringBuilder, but it does what you asked.


Better technique (although still not ideal):

  1. Reverse each string you want to insert.
  2. Append each string to a StringBuilder.
  3. Reverse the entire StringBuilder when you're done.

This will turn an O(n²) solution into O(n).



Related Topics



Leave a reply



Submit