Does Swift Copy on Write For All Structs

Does swift copy on write for all structs?

Array is implemented with copy-on-write behaviour – you'll get it regardless of any compiler optimisations (although of course, optimisations can decrease the number of cases where a copy needs to happen).

At a basic level, Array is just a structure that holds a reference to a heap-allocated buffer containing the elements – therefore multiple Array instances can reference the same buffer. When you come to mutate a given array instance, the implementation will check if the buffer is uniquely referenced, and if so, mutate it directly. Otherwise, the array will perform a copy of the underlying buffer in order to preserve value semantics.

However, with your Point structure – you're not implementing copy-on-write at a language level. Of course, as @Alexander says, this doesn't stop the compiler from performing all sorts of optimisations to minimise the cost of copying whole structures about. These optimisations needn't follow the exact behaviour of copy-on-write though – the compiler is simply free to do whatever it wishes, as long as the program runs according to the language specification.

In your specific example, both p1 and p2 are global, therefore the compiler needs to make them distinct instances, as other .swift files in the same module have access to them (although this could potentially be optimised away with whole-module optimisation). However, the compiler still doesn't need to copy the instances – it can just evaluate the floating-point addition at compile-time and initialise one of the globals with 0.0, and the other with 1.0.

And if they were local variables in a function, for example:

struct Point {
var x: Float = 0
}

func foo() {
var p1 = Point()
var p2 = p1
p2.x += 1
print(p2.x)
}

foo()

The compiler doesn't even have to create two Point instances to begin with – it can just create a single floating-point local variable initialised to 1.0, and print that.

Regarding passing value types as function arguments, for large enough types and (in the case of structures) functions that utilise enough of their properties, the compiler can pass them by reference rather than copying. The callee can then make a copy of them only if needed, such as when needing to work with a mutable copy.

In other cases where structures are passed by value, it's also possible for the compiler to specialise functions in order to only copy across the properties that the function needs.

For the following code:

struct Point {
var x: Float = 0
var y: Float = 1
}

func foo(p: Point) {
print(p.x)
}

var p1 = Point()
foo(p: p1)

Assuming foo(p:) isn't inlined by the compiler (it will in this example, but once its implementation reaches a certain size, the compiler won't think it worth it) – the compiler can specialise the function as:

func foo(px: Float) {
print(px)
}

foo(px: 0)

It only passes the value of Point's x property into the function, thereby saving the cost of copying the y property.

So the compiler will do whatever it can in order to reduce the copying of value types. But with so many various optimisations in different circumstances, you cannot simply boil the optimised behaviour of arbitrary value types down to just copy-on-write.

Which value types in Swift supports copy-on-write?

Copy-on write is supported for String and all collection types - Array, Dictionary and Set.

Besides that, compiler is free to optimize any struct access and effectively give you copy-on-write semantics, but it is not guaranteed.

How does an array in swift deep copy itself when copied or assigned

Assignment of any struct (such as Array) causes a shallow copy of the structure contents. There's no special behavior for Array. The buffer that stores the Array's elements is not actually part of the structure. A pointer to that buffer, stored on the heap, is part of the Array structure, meaning that upon assignment, the buffer pointer is copied, but it still points to the same buffer.

All mutating operations on Array do a check to see if the buffer is uniquely referenced. If so, then the algorithm proceeds. Otherwise, a copy of the buffer is made, and the pointer to the new buffer is saved to that Array instance, then the algorithm proceeds as previously. This is called Copy on Write (CoW). Notice that it's not an automatic feature of all value types. It is merely a manually implemented feature of a few standard library types (like Array, Set, Dictionary, String, and others). You could even implement it yourself for your own types.

When CoW occurs, it does not do any deep copying. It will copy values, which means:

  • In the case of value types (struct, enum, tuples), the values are the struct/enum/tuples themselves. In this case, a deep and shallow copy are the same thing.
  • In the case of reference types (class), the value being copied is the reference. The referenced object is not copied. The same object is pointed to by both the old and copied reference. Thus, it's a shallow copy.

Class vs. Struct in Swift (copying)

Why in the class declaration does it change the firstMessage object. Are they the same objects?

The example you gave is a really nice one because it succinctly illustrates the difference between class and struct, and you came about this close -> <- to answering your own question, even if you didn't realize it. As the other answers have explained, class creates a reference type, which means that when you assign an instance of a class to a variable, that variable gets a reference to the object, not a copy of it. You said so yourself:

//if I assign, its a reference to the original instance
var secondMessage = firstMessage

In your example, firstMessage and secondMessage are really references to the one object that you created. This kind of thing is done all the time in object oriented languages because it's often important to know that you're dealing with a specific object and not a copy, especially if you might want to make changes to that object. But that also brings danger: if your code can get a reference to an object and change it, so can some other code in the program. Shared objects that can be changed create all kinds of headaches when you start writing multithreaded code. When you added text to secondMessage, firstMessage also changed because both variables refer to the same object.

Changing the declaration of Message to struct makes it a value type, where assignment (for example) creates a new copy of the object in question instead of a new reference to the same object. When you added text to secondMessage after changing Message to a struct, the assignment secondMessage = firstMessage created a copy of firstMessage, and you only changed that copy.

Is this a rule that if I assign a new object from the old object?

Whether your assignment creates a copy of the object or a reference to it depends, as you've shown, on whether the thing being assigned has reference semantics (class) or value semantics (struct). So you need to be aware of the difference, but most of the time you don't need to think too hard about it. If you're dealing with an object where you don't care about the object's identity and are mainly concerned with its contents (like a number, string, or array), expect that to be a struct. If you care about which object you're dealing with, like the front window or the current document, that'll be a class.

Then I would have to declare secondMessage = Message() to make it a new instance.

Right -- if Message is a class, assigning one to a new variable or passing it into a method won't create a new one. So again, are you more likely to care about which message you're dealing with, or what is in the message?

When does the copying take place for swift value types

TL;DR:

So does it mean that the copying actually only takes placed when the passed value type is modified?

Yes!

Is there a way to demonstrate that this is actually the underlying behavior?

See the first example in the section on the copy-on-write optimization.

Should I just use NSArrray in this case or would the Swift Array work fine
as long as I do not try to manipulate the passed in Array?

If you pass your array as inout, then you'll have a pass-by-reference semantics,
hence obviously avoiding unnecessary copies.
If you pass your array as a normal parameter,
then the copy-on-write optimization will kick in and you shouldn't notice any performance drop
while still benefiting from more type safety that what you'd get with a NSArray.

Now as long as I do not explicitly make the variables in the function editable
by using var or inout, then the function can not modify the array anyway.
So does it still make a copy?

You will get a "copy", in the abstract sense.
In reality, the underlying storage will be shared, thanks to the copy-on-write mechanism,
hence avoiding unnecessary copies.

If the original array is immutable and the function is not using var or inout,
there is no point in Swift creating a copy. Right?

Exactly, hence the copy-on-write mechanism.

So what does Apple mean by the phrase above?

Essentially, Apple means that you shouldn't worry about the "cost" of copying value types,
as Swift optimizes it for you behind the scene.

Instead, you should just think about the semantics of value types,
which is that get a copy as soon as you assign or use them as parameters.
What's actually generated by Swift's compiler is the Swift's compiler business.

Value types semantics

Swift does indeed treat arrays as value types (as opposed to reference types),
along with structures, enumerations and most other built-in types
(i.e. those that are part of the standard library and not Foundation).
At the memory level, these types are actually immutable plain old data objects (POD),
which enables interesting optimizations.
Indeed, they are typically allocated on the stack rather than the heap [1],
(https://en.wikipedia.org/wiki/Stack-based_memory_allocation).
This allows the CPU to very efficiently manage them,
and to automatically deallocate their memory as soon as the function exits [2],
without the need for any garbage collection strategy.

Values are copied whenever assigned or passed as a function.
This semantics has various advantages,
such as avoiding the creation of unintended aliases,
but also as making it easier for the compiler to guarantee the lifetime of values
stored in a another object or captured by a closure.
We can think about how hard it can be to manage good old C pointers to understand why.

One may think it's an ill-conceived strategy,
as it involves copying every single time a variable is assigned or a function is called.
But as counterintuitive it may be,
copying small types is usually quite cheap if not cheaper than passing a reference.
After all, a pointer is usually the same size as an integer...

Concerns are however legitimate for large collections (i.e. arrays, sets and dictionaries),
and very large structures to a lesser extent [3].
But the compiler has has a trick to handle these, namely copy-on-write (see later).

What about mutating

Structures can define mutating methods,
which are allowed to mutate the fields of the structure.
This doesn't contradict the fact that value types are nothing more than immutable PODs,
as in fact calling a mutating method is merely a huge syntactic sugar
for reassigning a variable to a brand new value that's identical to the previous ones,
except for the fields that were mutated.
The following example illustrates this semantical equivalence:

struct S {
var foo: Int
var bar: Int
mutating func modify() {
foo = bar
}
}

var s1 = S(foo: 0, bar: 10)
s1.modify()

// The two lines above do the same as the two lines below:
var s2 = S(foo: 0, bar: 10)
s2 = S(foo: s2.bar, bar: s2.bar)

Reference types semantics

Unlike value types, reference types are essentially pointers to the heap at the memory level.
Their semantics is closer to what we would get in reference-based languages,
such as Java, Python or Javascript.
This means they do not get copied when assigned or passed to a function, their address is.
Because the CPU is no longer able to manage the memory of these objects automatically,
Swift uses a reference counter to handle garbage collection behind the scenes
(https://en.wikipedia.org/wiki/Reference_counting).

Such semantics has the obvious advantage to avoid copies,
as everything is assigned or passed by reference.
The drawback is the danger of unintended aliases,
as in almost any other reference-based language.

What about inout

An inout parameter is nothing more than a read-write pointer to the expected type.
In the case of value types, it means the function won't get a copy of the value,
but a pointer to such values,
so mutations inside the function will affect the value parameter (hence the inout keyword).
In other terms, this gives value types parameters a reference semantics in the context of the function:

func f(x: inout [Int]) {
x.append(12)
}

var a = [0]
f(x: &a)

// Prints '[0, 12]'
print(a)

In the case of reference types, it will make the reference itself mutable,
pretty much as if the passed argument was a the address of the address of the object:

func f(x: inout NSArray) {
x = [12]
}

var a: NSArray = [0]
f(x: &a)

// Prints '(12)'
print(a)

Copy-on-write

Copy-on-write (https://en.wikipedia.org/wiki/Copy-on-write) is an optimization technique that
can avoid unnecessary copies of mutable variables,
which is implemented on all Swift's built-in collections (i.e. array, sets and dictionaries).
When you assign an array (or pass it to a function),
Swift doesn't make a copy of the said array and actually uses a reference instead.
The copy will take place as soon as the your second array is mutated.
This behavior can be demonstrated with the following snippet (Swift 4.1):

let array1 = [1, 2, 3]
var array2 = array1

// Will print the same address twice.
array1.withUnsafeBytes { print($0.baseAddress!) }
array2.withUnsafeBytes { print($0.baseAddress!) }

array2[0] = 1

// Will print a different address.
array2.withUnsafeBytes { print($0.baseAddress!) }

Indeed, array2 doesn't get a copy of array1 immediately,
as shown by the fact it points to the same address.
Instead, the copy is triggered by the mutation of array2.

This optimization also happens deeper in the structure,
meaning that if for instance your collection is made of other collections,
the latter will also benefit from the copy-on-write mechanism,
as demonstrated by the following snippet (Swift 4.1):

var array1 = [[1, 2], [3, 4]]
var array2 = array1

// Will print the same address twice.
array1[1].withUnsafeBytes { print($0.baseAddress!) }
array2[1].withUnsafeBytes { print($0.baseAddress!) }

array2[0] = []

// Will print the same address as before.
array2[1].withUnsafeBytes { print($0.baseAddress!) }

Replicating copy-on-write

It is in fact rather easy to implement the copy-on-write mechanism in Swift,
as some of the its reference counter API is exposed to the user.
The trick consists of wrapping a reference (e.g. a class instance) within a structure,
and to check whether that reference is uniquely referenced before mutating it.
When that's the case, the wrapped value can be safely mutated,
otherwise it should be copied:

final class Wrapped<T> {
init(value: T) { self.value = value }
var value: T
}

struct CopyOnWrite<T> {
init(value: T) { self.wrapped = Wrapped(value: value) }
var wrapped: Wrapped<T>
var value: T {
get { return wrapped.value }
set {
if isKnownUniquelyReferenced(&wrapped) {
wrapped.value = newValue
} else {
wrapped = Wrapped(value: newValue)
}
}
}
}

var a = CopyOnWrite(value: SomeLargeObject())

// This line doesn't copy anything.
var b = a

However, there is an import caveat here!
Reading the documentation for isKnownUniquelyReferenced we get this warning:

If the instance passed as object is being accessed by multiple threads simultaneously,
this function may still return true.
Therefore, you must only call this function from mutating methods
with appropriate thread synchronization.

This means the implementation presented above isn't thread safe,
as we may encounter situations where it'd wrongly assumes the wrapped object can be safely mutated,
while in fact such mutation would break invariant in another thread.
Yet this doesn't mean Swift's copy-on-write is inherently flawed in multithreaded programs.
The key is to understand what "accessed by multiple threads simultaneously" really means.
In our example, this would happen if the same instance of CopyOnWrite was shared across multiple threads,
for instance as part of a shared global variable.
The wrapped object would then have a thread safe copy-on-write semantics,
but the instance holding it would be subject to data race.
The reason is that Swift must establish unique ownership
to properly evaluate isKnownUniquelyReferenced [4],
which it can't do if the owner of the instance is itself shared across multiple threads.

Value types and multithreading

It is Swift's intention to alleviate the burden of the programmer
when dealing with multithreaded environments, as stated on Apple's blog
(https://developer.apple.com/swift/blog/?id=10):

One of the primary reasons to choose value types over reference types
is the ability to more easily reason about your code.
If you always get a unique, copied instance,
you can trust that no other part of your app is changing the data under the covers.
This is especially helpful in multi-threaded environments
where a different thread could alter your data out from under you.
This can create nasty bugs that are extremely hard to debug.

Ultimately, the copy-on-write mechanism is a resource management optimization that,
like any other optimization technique,
one shouldn't think about when writing code [5].
Instead, one should think in more abstract terms
and consider values to be effectively copied when assigned or passed as arguments.


[1]
This holds only for values used as local variables.
Values used as fields of a reference type (e.g. a class) are also stored in the heap.

[2]
One could get confirmation of that by checking the LLVM byte code that's produced
when dealing with value types rather than reference types,
but the Swift compiler being very eager to perform constant propagation,
building a minimal example is a bit tricky.

[3]
Swift doesn't allow structures to reference themselves,
as the compiler would be unable to compute the size of such type statically.
Therefore, it is not very realistic to think of a structure that is so large
that copying it would become a legitimate concern.

[4]
This is, by the way, the reason why isKnownUniquelyReferenced accepts an inout parameter,
as it's currently Swift's way to establish ownership.

[5]
Although passing copies of value-type instances should be safe,
there's a open issue that suggests some problems with the current implementation
(https://bugs.swift.org/browse/SR-6543).

How Swift implement Array's copy-on-write behavior?

Okay, I did many experiment and finally figured out.

  1. We can's use & to get array address because once we do that , Swift will copy the array to better interact with C, use & get object's address that adjacent to array and do the math instead. Or use lldb instruction frame variable -L
  2. The whole Array is copied once any of it's value element changed.
  3. Actual value element of Array is allocated at heap.
  4. Swift also did a lot of optimization for Array whose element is class.
  5. Swift is awesome.

I actually write my first blog for this.



Related Topics



Leave a reply



Submit