C# String Reference Type

In C#, why is String a reference type that behaves like a value type?

Strings aren't value types since they can be huge, and need to be stored on the heap. Value types are (in all implementations of the CLR as of yet) stored on the stack. Stack allocating strings would break all sorts of things: the stack is only 1MB for 32-bit and 4MB for 64-bit, you'd have to box each string, incurring a copy penalty, you couldn't intern strings, and memory usage would balloon, etc...

(Edit: Added clarification about value type storage being an implementation detail, which leads to this situation where we have a type with value sematics not inheriting from System.ValueType. Thanks Ben.)

string is reference type but why it work as a value type when it's assignment updated

This is a common misunderstanding of the use of references.

s1 is a reference type but its content is a value. You could think all variables are value types, but the way the compiler handles them varies based on value or reference type.

    string s1 = "abc"; 

s1 is equal to the address where "abc" is stored, let's say 0x0000AAAA

    string s2 = s1; 

s2 points to the same address as s1 hence, its value is the same as s1. Both have values 0x000AAAA

    s2 = "123"; 

Strings are immutable, meaning you cannot modify a string, anytime you assign a new value or a modification, you are creating a new string somewhere else in memory while the previous one gets ready for GC, if needed (not in our case actually). At that point, s1 still has value 0x0000AAAA while s2 has a new one 0X0000BBBB.

    Debug.Log(s1 + ":" + s2);

Since both points at different contents, they print different results.

It is only a reference type because the value contained in the variable is not meant to be used as is but is meant to send the pointer to an address location in memory where the actual object is stored.

Except if you use out/ref (the & in C++) then it is implied that the value of the variable is to be used (the address), most likely as a parameter.

Note that this behaviour is the same with any objects, not only string.

Dog dogA = new Dog();
Dog dogB = dogA;
dogA.name = "Wolfie"; // Affect both since we are dereferencing
dogA = new Dog(); // dogA is new object, dogB is still Wolfie

EDIT: OP required explanation on ref/out.

when you want to change an object you would think of the following:

void ChangeObject(GameObject objParam)
{
objParam = new GameObject("Eve");
}

void Start(){
GameObject obj = new GameObject("Adam");
ChangeObject(obj);
Debug.Log(obj.name); // Adam...hold on should be Eve (??!!)
}

ChangeObject gets a GameObejct as parameter, the compiler copies the value contained in obj (00000AAAA) onto objParam, it makes a copy of it and both now have same value.

Within the method, objParam is given a new value and is no more related to the obj outside of the method. objParam is local to the method and is removed on completion (the game object is still in the scene but the reference is lost).

If you wish to change obj within the method:

void ChangeObject(ref GameObject objParam)
{
objParam = new GameObject("Eve");
}

void Start(){
GameObject obj = new GameObject("Adam");
ChangeObject(ref obj);
Debug.Log(obj.name); // Yeah it is Eve !!!
}

this time, it was not the value of obj that is passed but the address of obj. So obj may contain 0x0000AAAA but its own address is 0x0000AABB, then objParam's value is now 0x0000AABB and changing objParam means changing the value stored at 0x0000AABB.

out and ref works the same, only that out requires that a value is assigned within the method, while ref can leave the method without affect the given parameter.

string is value type or reference type?

Here's what's going on:

string string1 = "abc";

string1 is a reference to the string "abc" which lives on the heap somewhere.

string string2 = string1;

string2 is another reference, that just happens to point to the same place in memory as string1. That reference was copied over to string2 when you used the assignment operator. You now have two variables that point to the same place in memory.

string1 = "xyz";

string1 now points to a newly created string that contains "xyz". string2 still points over to "abc", which is still reachable and won't be garbage collected.

In any case, yes; String is a reference type.

Why is string a reference type?

Yikes, this answer got accepted and then I changed it. I should probably include the original answer at the bottom since that's what was accepted by the OP.

New Answer

Update: Here's the thing. string absolutely needs to behave like a reference type. The reasons for this have been touched on by all answers so far: the string type does not have a constant size, it makes no sense to copy the entire contents of a string from one method to another, string[] arrays would otherwise have to resize themelves -- just to name a few.

But you could still define string as a struct that internally points to a char[] array or even a char* pointer and an int for its length, make it immutable, and voila!, you'd have a type that behaves like a reference type but is technically a value type.

This would seem quite silly, honestly. As Eric Lippert has pointed out in a few of the comments to other answers, defining a value type like this is basically the same as defining a reference type. In nearly every sense, it would be indistinguishable from a reference type defined the same way.

So the answer to the question "Why is string a reference type?" is, basically: "To make it a value type would just be silly." But if that's the only reason, then really, the logical conclusion is that string could actually have been defined as a struct as described above and there would be no particularly good argument against that choice.

However, there are reasons that it's better to make string a class than a struct that are more than purely intellectual. Here are a couple I was able to think of:

To prevent boxing

If string were a value type, then every time you passed it to some method expecting an object it would have to be boxed, which would create a new object, which would bloat the heap and cause pointless GC pressure. Since strings are basically everywhere, having them cause boxing all the time would be a big problem.

For intuitive equality comparison

Yes, string could override Equals regardless of whether it's a reference type or value type. But if it were a value type, then ReferenceEquals("a", "a") would return false! This is because both arguments would get boxed, and boxed arguments never have equal references (as far as I know).

So, even though it's true that you could define a value type to act just like a reference type by having it consist of a single reference type field, it would still not be exactly the same. So I maintain this as the more complete reason why string is a reference type: you could make it a value type, but this would only burden it with unnecessary weaknesses.


Original Answer

It's a reference type because only references to it are passed around.

If it were a value type then every time you passed a string from one method to another the entire string would be copied*.

Since it is a reference type, instead of string values like "Hello world!" being passed around -- "Hello world!" is 12 characters, by the way, which means it requires (at least) 24 bytes of storage -- only references to those strings are passed around. Passing around a reference is much cheaper than passing every single character in a string.

Also, it's really not a normal primitive data type. Who told you that?

*Actually, this isn't stricly true. If the string internally held a char[] array, then as long as the array type is a reference type, the contents of the string would actually not be passed by value -- only the reference to the array would be. I still think this is basically right answer, though.

Is string a value type or a reference type?


Console.WriteLine(typeof(string).IsClass); // true

It's a reference type.

It can't be a value-type, as value-types need a known size for the stack etc. As a reference-type, the size of the reference is known in advance, even if the size of the string isn't.

It behaves like you expect a value-type to behave because it is immutable; i.e. it doesn't* change once created. But there are lots of other immutable reference-types. Delegate instances, for example.

*=except for inside StringBuilder, but you never see it while it is doing this...

String reference in C#

You are right that, initially, both x and y reference the same object:

       +-----------+
y -> | hello |
+-----------+
^
x --------+

Now have a look at this line:

x = x.Replace('h','j');

The following happens:

  1. x.Replace creates a new string (with h replaced by j) and returns a reference to this new string.

           +-----------+    +------------+
    y -> | hello | | jello |
    +-----------+ +------------+
    ^
    x --------+
  2. With x = ..., you assign x to this new reference. y still references the old string.

           +-----------+    +------------+
    y -> | hello | | jello |
    +-----------+ +------------+
    ^
    x --------------------------+

So how do you modify a string in-place? You don't. C# does not support modifying strings in-place. Strings are deliberately designed as an immutable data structure. For a mutable string-like data structure, use StringBuilder:

var x = new System.Text.StringBuilder("hello");
var y = x;

// note that we did *not* write x = ..., we modified the value in-place
x.Replace('h','j');

// both print "jello"
Console.WriteLine(x);
Console.WriteLine(y);

Why reference types inside structs behave like value types?


strings are reference types that have pointers stored on stack while their actual contents stored on heap

No no no. First off, stop thinking about stack and heap. This is almost always the wrong way to think in C#. C# manages storage lifetime for you.

Second, though references may be implemented as pointers, references are not logically pointers. References are references. C# has both references and pointers. Don't mix them up. There is no pointer to string in C#, ever. There are references to string.

Third, a reference to a string could be stored on the stack but it could also be stored on the heap. When you have an array of references to string, the array contents are on the heap.

Now let's come to your actual question.

    Person person_1 = new Person();
person_1.name = "Person 1";
Person person_2 = person_1; // This is the interesting line
person_2.name = "Person 2";

Let's illustrate what the code does logically. Your Person struct is nothing more than a string reference, so your program is the same as:

string person_1_name = null; // That's what new does on a struct
person_1_name = "Person 1";
string person_2_name = person_1_name; // Now they refer to the same string
person_2_name = "Person 2"; // And now they refer to different strings

When you say person2 = person1 that does not mean that the variable person1 is now an alias for the variable person2. (There is a way to do that in C#, but this is not it.) It means "copy the contents of person1 to person2". The reference to the string is the value that is copied.

If that's not clear try drawing boxes for variables and arrows for references; when the struct is copied, a copy of the arrow is made, not a copy of the box.

C# String reference type passed as copy?

It has nothing to do with mutability

string s = "lana del rey" 
string d = s;

Here 2 variables s and d refer to the same object in memory.

s = "elvis presley";

here in the right part of the statement the new object is allocated and initialized with "elvis presley" and assigned to s. So now s refers to another object. And while we haven't change the d reference value - it continues referring to the "lana del rey" as it originally did.

Now the real life analogy:

There are 2 people (A and B) pointing using their fingers to a building far away. They are independent to each other, and don't even see what another is pointing to. Then A decides to start pointing to another building. As long as they aren't connected to each other - now A points to another building, and B continues pointing to the original building (since no one asked them to stop doing that)

PS: what you probably are confusing is the concept behind a pointer and a reference. Not sure if it makes sense to explain it here since you might be confused even more. But now at least you might google for the corresponding keywords.



Related Topics



Leave a reply



Submit