When is a C# value/object copied and when is its reference copied?
It's hard to answer this sort of question precisely without spending an awful lot of time picking your words carefully.
I've done so in a couple of articles which you may find useful:
- Parameter passing in C# / .NET
- Reference types and value types in C# / .NET
That's not to say that the articles are perfect, of course - far from it - but I've tried to be as clear as I can.
I think one important thing is to separate the two concepts (parameter passing and reference vs value types) out in your head.
To look at your specific examples:
SomeForm myForm = new SomeForm();
SomeObject myObject = new SomeObject();
myForm.formObject = myObject;
This means that myForm.formObject
and myObject
refer to the same instance of SomeObject
- like two people having separate pieces of paper, with each one having the same address written on them. If you go to the address on one piece of paper and paint the house red, then go to the address on the second piece of paper, you'll see a red house.
It's not clear what you mean by "and then modify the object in the form" because the type you have provided is immutable. There's no way of modifying the object itself. You can change myForm.formObject
to refer to a different instance of SomeObject
, but that's like scribbling out the address on one piece of paper and writing a different address on it instead. That won't change what's written on the other piece of paper.
If you could provide a short but complete program whose behaviour you don't understand (ideally a console application, just to keep things shorter and simpler) it would be easier to talk about things in concrete terms.
Why is the original object changed after a copy, without using ref arguments?
It is always interesting to explain how this works. Of course my explanation could not be on par with the magnificiency of the Jon Skeet one or Joseph Albahari, but I would try nevertheless.
In the old days of C programming, grasping the concept of pointers was fundamental to work with that language. So many years are passed and now we call them references but they are still ... glorified pointers and, if you understand how they work, you are half the way to become a programmer (just kidding)
What is a reference? In a very short answer I would tell. It is a number stored in a variable and this number represent an address in memory where your data lies.
Why we need references? Because it is very simple to handle a single number with which we could read the memory area of our data instead of having a whole object with all its fields moved along with our code.
So, what happens when we write
var myclass = new MyClass();
We all know that this is a call to the constructor of the class MyClass
, but for the Framework it is also a request to provide a memory area where the values of the instance (property, fields and other internal housekeeping infos) live and exist in a specific point in time. Suppose that MyClass needs 100 bytes to store everything it needs. The framework search the computer memory in some way and let's suppose that it finds a place in memory identified by the address 4200. This value (4200) is the value that it is assigned to the var myclass
It is a pointer to the memory (oops it is a reference to the object instance)
Now what happens when you call?
var copy = myclass;
Nothing particular. The copy
variable gets the same value of myclass
(4200). But the two variables are referencing the same memory area so using one or the other doesn't make any difference. The memory area (the instance of MyClass) is still located at our fictional memory address 4200.
myclass.Mystring = "jadajadajada";
This uses the reference value as a base value to find the area of memory occupied by the property and sets its value to the intern area where the literal strings are kept. If I could make an analogy with pointers it is as you take the base memory (4200), add an offset to find the point where the reference representing the propery MyString is kept inside the boundaries of the 100 bytes occupied by our object instance. Let's say that the MyString reference is 42 bytes past the beginning of the memory area. Adding 42 to 4200 yelds 4242 and this is the point in which the reference to the literal "jadajadajada" will be stored.
Dal.DoSomeThing(copy);
Here the problem (well the point where you have the problem). When you pass the copy
variable don't think that the framework repeat the search for a memory area and copy everything from the original area in a new area. No, it would be practically impossible (think about if MyClass contains a property that is an instance of another class and so on... it could never stop.) So the value passed to the DoSomeThing
method is again the reference value 4200. This value is automatically assigned to the local variable daclass
declared as the input parameter for DoSomething
(It is like you have explicitly done before with var copy = myclass;
.
At this point it is clear that any operation using daClass
acts on the same memory area occupied by the original instance and you see the results when code returns back to your starting point.
I beg the pardon from the more technically expert users here. Particularly for my casual and imprecise use of the term 'memory address'.
Pass by copy or reference?
In C# / .Net objects can either be classified as value or reference types [1]. Value types are any type which derive from System.ValueType
and are defined in C# with the struct
type declaration. These are passed by copy / value.
Reference types are types which do not derive from System.ValueType
and are defined in C# with the class
keyword. Identifiers to instances of reference types are known as references (similar to pointers). These are also passed by value by default but only the reference is passed not the whole object.
Your question also mentioned that string
instances are passed by copy. String
in .Net is a reference type (derives directly from System.Object
) and hence is not passed by full copy.
[1] Pointers may merit their own class here but I'm ignoring them for this discussion.
C# pass by value vs. pass by reference
Re: OP's Assertion
It is universally acknowledged (in C# at least) that when you pass by reference, the method contains a reference to the object being manipulated, whereas when you pass by value, the method copies the value being manipulated ...
TL;DR
There's more to it than that. Unless you pass variables with the ref or out keywords, C# passes variables to methods by value, irrespective of whether the variable is a value type or a reference type.
If passed by reference, then the called function may change the variable's address at the call-site (i.e. change the original calling function's variable's assignment).
If a variable is passed by value:
- if the called function re-assigns the variable, this change is local to the called function only, and will not affect the original variable in the calling function
- however, if changes are made to the variable's fields or properties by the called function, it will depend on whether the variable is a value type or a reference type in order to determine whether the calling function will observe the changes made to this variable.
Since this is all rather complicated, I would recommend avoiding passing by reference if possible (instead, if you need to return multiple values from a function, use a composite class, struct, or Tuples as a return
type instead of using the ref
or out
keywords on parameters)
Also, when passing reference types around, a lot of bugs can be avoided by not changing (mutating) fields and properties of an object passed into a method (for example, use C#'s immutable properties to prevent changes to properties, and strive to assign properties only once, during construction).
In Detail
The problem is that there are two distinct concepts:
- Value Types (e.g. int) vs Reference Types (e.g. string, or custom classes)
- Passing by Value (default behaviour) vs Passing by Reference(ref, out)
Unless you explicitly pass (any) variable by reference, by using the out
or ref
keywords, parameters are passed by value in C#, irrespective of whether the variable is a value type or reference type.
When passing value types (such as int
, float
or structs like DateTime
) by value (i.e. without out
or ref
), the called function gets a copy of the entire value type (via the stack).
Any change to the value type, and any changes to any properties / fields of the copy will be lost when the called function is exited.
However, when passing reference types (e.g. custom classes like your MyPoint
class) by value
, it is the reference
to the same, shared object instance which is copied and passed on the stack.
This means that:
- If the passed object has mutable (settable) fields and properties, any changes to those fields or properties of the shared object are permanent (i.e. any changes to
x
ory
are seen by anyone observing the object) - However, during method calls, the reference itself is still copied (passed by value), so if the parameter variable is reassigned, this change is made only to the local copy of the reference, so the change will not be seen by the caller. This is why your code doesn't work as expected
What happens here:
void Replace<T>(T a, T b) // Both a and b are passed by value
{
a = b; // reassignment is localized to method `Replace`
}
for reference types T
, means that the local variable (stack) reference to the object a
is reassigned to the local stack reference b
. This reassign is local to this function only - as soon as scope leaves this function, the re-assignment is lost.
If you really want to replace the caller's references, you'll need to change the signature like so:
void Replace<T>(ref T a, T b) // a is passed by reference
{
a = b; // a is reassigned, and is also visible to the calling function
}
This changes the call to call by reference - in effect we are passing the address of the caller's variable to the function, which then allows the called method to alter the calling method's variable.
However, nowadays:
- Passing by reference is generally regarded as a bad idea - instead, we should either pass return data in the return value, and if there is more than one variable to be returned, then use a
Tuple
or a customclass
orstruct
which contains all such return variables. - Changing ('mutating') a shared value (and even reference) variable in a called method is frowned upon, especially by the Functional Programming community, as this can lead to tricky bugs, especially when using multiple threads. Instead, give preference to immutable variables, or if mutation is required, then consider changing a (potentially deep) copy of the variable. You might find topics around 'pure functions' and 'const correctness' interesting further reading.
Edit
These two diagrams may help with the explanation.
Pass by value (reference types):
In your first instance (Replace<T>(T a,T b)
), a
and b
are passed by value. For reference types, this means the references are copied onto the stack and passed to the called function.
- Your initial code (I've called this
main
) allocates twoMyPoint
objects on the managed heap (I've called thesepoint1
andpoint2
), and then assigns two local variable referencesa
andb
, to reference the points, respectively (the light blue arrows):
MyPoint a = new MyPoint { x = 1, y = 2 }; // point1
MyPoint b = new MyPoint { x = 3, y = 4 }; // point2
The call to
Replace<Point>(a, b)
then pushes a copy of the two references onto the stack (the red arrows). MethodReplace
sees these as the two parameters also nameda
andb
, which still point topoint1
andpoint2
, respectively (the orange arrows).The assignment,
a = b;
then changes theReplace
methods'a
local variable such thata
now points to the same object as referenced byb
(i.e.point2
). However, note that this change is only to Replace's local (stack) variables, and this change will only affect subsequent code inReplace
(the dark blue line). It does NOT affect the calling function's variable references in any way, NOR does this change thepoint1
andpoint2
objects on the heap at all.
Pass by reference:
If however we we change the call to Replace<T>(ref T a, T b)
and then change main
to pass a
by reference, i.e. Replace(ref a, b)
:
As before, two point objects allocated on the heap.
Now, when
Replace(ref a, b)
is called, whilemain
s referenceb
(pointing topoint2
) is still copied during the call,a
is now passed by reference, meaning that the "address" to main'sa
variable is passed toReplace
.Now when the assignment
a = b
is made ...It is the the calling function,
main
'sa
variable reference which is now updated to referencepoint2
. The change made by the re-assignment toa
is now seen by bothmain
andReplace
. There are now no references topoint1
Changes to (heap allocated) object instances are seen by all code referencing the object
In both scenarios above, no changes were actually made to the heap objects, point1
and point2
, it was only local variable references which were passed and re-assigned.
However, if any changes were actually made to the heap objects point1
and point2
, then all variable references to these objects would see these changes.
So, for example:
void main()
{
MyPoint a = new MyPoint { x = 1, y = 2 }; // point1
MyPoint b = new MyPoint { x = 3, y = 4 }; // point2
// Passed by value, but the properties x and y are being changed
DoSomething(a, b);
// a and b have been changed!
Assert.AreEqual(53, a.x);
Assert.AreEqual(21, b.y);
}
public void DoSomething(MyPoint a, MyPoint b)
{
a.x = 53;
b.y = 21;
}
Now, when execution returns to main
, all references to point1
and point2
, including main's
variables a
and b
, which will now 'see' the changes when they next read the values for x
and y
of the points. You will also note that the variables a
and b
were still passed by value to DoSomething
.
Changes to value types affect the local copy only
Value types (primitives like System.Int32
, System.Double
) and structs (like System.DateTime
, or your own structs) are allocated on the stack, not the heap, and are copied verbatim onto the stack when passed into a call. This leads to a major difference in behaviour, since changes made by the called function to a value type field or property will only be observed locally by the called function, because it only will be mutating the local copy of the value type.
e.g. Consider the following code with an instance of the mutable struct, System.Drawing.Rectangle
public void SomeFunc(System.Drawing.Rectangle aRectangle)
{
// Only the local SomeFunc copy of aRectangle is changed:
aRectangle.X = 99;
// Passes - the changes last for the scope of the copied variable
Assert.AreEqual(99, aRectangle.X);
} // The copy aRectangle will be lost when the stack is popped.
// Which when called:
var myRectangle = new System.Drawing.Rectangle(10, 10, 20, 20);
// A copy of `myRectangle` is passed on the stack
SomeFunc(myRectangle);
// Test passes - the caller's struct has NOT been modified
Assert.AreEqual(10, myRectangle.X);
The above can be quite confusing and highlights why it is good practice to create your own custom structs as immutable.
The ref
keyword works similarly to allow value type variables to be passed by reference, viz that the 'address' of the caller's value type variable is passed onto the stack, and assignment of the caller's assigned variable is now directly possible.
How do I copy an object by value instead of by reference in C#
MemberwiseClone
will do a shallow copy, which means you'll get a new instance, whose fields have the same references as the initial object.
You want to do a deep copy instead. Here's a copy with one more level:
public object Clone()
{
var copy = new Solution();
foreach (var pair in solution)
copy.solution.Add(pair.Key, pair.Value);
return copy;
}
This will copy the dictionary, but its keys and values will still point to the same instances. So you may perform a deeper copy with something like that:
public object Clone()
{
var copy = new Solution();
foreach (var pair in solution)
copy.solution.Add(new Room(pair.Key), pair.Value.ToList());
return copy;
}
Or:
public object Clone()
{
var copy = new Solution();
foreach (var pair in solution)
copy.solution.Add(new Room(pair.Key), pair.Value.Select(i => new Patient(i)).ToList());
return copy;
}
You need to write some copy constructors for Room
and Patient
for that. Hopefully you get the idea.
Copy Reference Type from one to another C#
You are not creating a copy, you are merely assigning the reference of an object (obj
) to another variable (newobj
). Accessing either of them points to the same location in memory.
To create a copy of an object, you have to clone it. See https://stackoverflow.com/a/78612/1028323
https://stackoverflow.com/a/129395/1028323
for example.
public static T DeepClone<T>(T obj)
{
using (var ms = new MemoryStream())
{
var formatter = new BinaryFormatter();
formatter.Serialize(ms, obj);
ms.Position = 0;
return (T) formatter.Deserialize(ms);
}
}
Related Topics
Suppress Properties with Null Value on ASP.NET Web API
Binding Property to Control in Winforms
How to Execute an X86 Assembly Sequence from Within C#
Send Http Post Message in ASP.NET Core Using Httpclient Postasjsonasync
How to Get the Network Interface and Its Right Ipv4 Address
ASP.NET MVC Dropdown List from Selectlist
C# Native Host with Chrome Native Messaging
The Object Cannot Be Deleted Because It Was Not Found in the Objectstatemanager
Cast Linq Result to Observablecollection
Button Inside a Winforms Textbox
Why Is It Impossible to Override a Getter-Only Property and Add a Setter
Namespace and Class with the Same Name
How Do Closures Work Behind the Scenes? (C#)
How to Find One Image Inside of Another