When Does the Copying Take Place for Swift Value Types

When does the copying take place for swift value types

TL;DR:

So does it mean that the copying actually only takes placed when the passed value type is modified?

Yes!

Is there a way to demonstrate that this is actually the underlying behavior?

See the first example in the section on the copy-on-write optimization.

Should I just use NSArrray in this case or would the Swift Array work fine
as long as I do not try to manipulate the passed in Array?

If you pass your array as inout, then you'll have a pass-by-reference semantics,
hence obviously avoiding unnecessary copies.
If you pass your array as a normal parameter,
then the copy-on-write optimization will kick in and you shouldn't notice any performance drop
while still benefiting from more type safety that what you'd get with a NSArray.

Now as long as I do not explicitly make the variables in the function editable
by using var or inout, then the function can not modify the array anyway.
So does it still make a copy?

You will get a "copy", in the abstract sense.
In reality, the underlying storage will be shared, thanks to the copy-on-write mechanism,
hence avoiding unnecessary copies.

If the original array is immutable and the function is not using var or inout,
there is no point in Swift creating a copy. Right?

Exactly, hence the copy-on-write mechanism.

So what does Apple mean by the phrase above?

Essentially, Apple means that you shouldn't worry about the "cost" of copying value types,
as Swift optimizes it for you behind the scene.

Instead, you should just think about the semantics of value types,
which is that get a copy as soon as you assign or use them as parameters.
What's actually generated by Swift's compiler is the Swift's compiler business.

Value types semantics

Swift does indeed treat arrays as value types (as opposed to reference types),
along with structures, enumerations and most other built-in types
(i.e. those that are part of the standard library and not Foundation).
At the memory level, these types are actually immutable plain old data objects (POD),
which enables interesting optimizations.
Indeed, they are typically allocated on the stack rather than the heap [1],
(https://en.wikipedia.org/wiki/Stack-based_memory_allocation).
This allows the CPU to very efficiently manage them,
and to automatically deallocate their memory as soon as the function exits [2],
without the need for any garbage collection strategy.

Values are copied whenever assigned or passed as a function.
This semantics has various advantages,
such as avoiding the creation of unintended aliases,
but also as making it easier for the compiler to guarantee the lifetime of values
stored in a another object or captured by a closure.
We can think about how hard it can be to manage good old C pointers to understand why.

One may think it's an ill-conceived strategy,
as it involves copying every single time a variable is assigned or a function is called.
But as counterintuitive it may be,
copying small types is usually quite cheap if not cheaper than passing a reference.
After all, a pointer is usually the same size as an integer...

Concerns are however legitimate for large collections (i.e. arrays, sets and dictionaries),
and very large structures to a lesser extent [3].
But the compiler has has a trick to handle these, namely copy-on-write (see later).

What about mutating

Structures can define mutating methods,
which are allowed to mutate the fields of the structure.
This doesn't contradict the fact that value types are nothing more than immutable PODs,
as in fact calling a mutating method is merely a huge syntactic sugar
for reassigning a variable to a brand new value that's identical to the previous ones,
except for the fields that were mutated.
The following example illustrates this semantical equivalence:

struct S {
var foo: Int
var bar: Int
mutating func modify() {
foo = bar
}
}

var s1 = S(foo: 0, bar: 10)
s1.modify()

// The two lines above do the same as the two lines below:
var s2 = S(foo: 0, bar: 10)
s2 = S(foo: s2.bar, bar: s2.bar)

Reference types semantics

Unlike value types, reference types are essentially pointers to the heap at the memory level.
Their semantics is closer to what we would get in reference-based languages,
such as Java, Python or Javascript.
This means they do not get copied when assigned or passed to a function, their address is.
Because the CPU is no longer able to manage the memory of these objects automatically,
Swift uses a reference counter to handle garbage collection behind the scenes
(https://en.wikipedia.org/wiki/Reference_counting).

Such semantics has the obvious advantage to avoid copies,
as everything is assigned or passed by reference.
The drawback is the danger of unintended aliases,
as in almost any other reference-based language.

What about inout

An inout parameter is nothing more than a read-write pointer to the expected type.
In the case of value types, it means the function won't get a copy of the value,
but a pointer to such values,
so mutations inside the function will affect the value parameter (hence the inout keyword).
In other terms, this gives value types parameters a reference semantics in the context of the function:

func f(x: inout [Int]) {
x.append(12)
}

var a = [0]
f(x: &a)

// Prints '[0, 12]'
print(a)

In the case of reference types, it will make the reference itself mutable,
pretty much as if the passed argument was a the address of the address of the object:

func f(x: inout NSArray) {
x = [12]
}

var a: NSArray = [0]
f(x: &a)

// Prints '(12)'
print(a)

Copy-on-write

Copy-on-write (https://en.wikipedia.org/wiki/Copy-on-write) is an optimization technique that
can avoid unnecessary copies of mutable variables,
which is implemented on all Swift's built-in collections (i.e. array, sets and dictionaries).
When you assign an array (or pass it to a function),
Swift doesn't make a copy of the said array and actually uses a reference instead.
The copy will take place as soon as the your second array is mutated.
This behavior can be demonstrated with the following snippet (Swift 4.1):

let array1 = [1, 2, 3]
var array2 = array1

// Will print the same address twice.
array1.withUnsafeBytes { print($0.baseAddress!) }
array2.withUnsafeBytes { print($0.baseAddress!) }

array2[0] = 1

// Will print a different address.
array2.withUnsafeBytes { print($0.baseAddress!) }

Indeed, array2 doesn't get a copy of array1 immediately,
as shown by the fact it points to the same address.
Instead, the copy is triggered by the mutation of array2.

This optimization also happens deeper in the structure,
meaning that if for instance your collection is made of other collections,
the latter will also benefit from the copy-on-write mechanism,
as demonstrated by the following snippet (Swift 4.1):

var array1 = [[1, 2], [3, 4]]
var array2 = array1

// Will print the same address twice.
array1[1].withUnsafeBytes { print($0.baseAddress!) }
array2[1].withUnsafeBytes { print($0.baseAddress!) }

array2[0] = []

// Will print the same address as before.
array2[1].withUnsafeBytes { print($0.baseAddress!) }

Replicating copy-on-write

It is in fact rather easy to implement the copy-on-write mechanism in Swift,
as some of the its reference counter API is exposed to the user.
The trick consists of wrapping a reference (e.g. a class instance) within a structure,
and to check whether that reference is uniquely referenced before mutating it.
When that's the case, the wrapped value can be safely mutated,
otherwise it should be copied:

final class Wrapped<T> {
init(value: T) { self.value = value }
var value: T
}

struct CopyOnWrite<T> {
init(value: T) { self.wrapped = Wrapped(value: value) }
var wrapped: Wrapped<T>
var value: T {
get { return wrapped.value }
set {
if isKnownUniquelyReferenced(&wrapped) {
wrapped.value = newValue
} else {
wrapped = Wrapped(value: newValue)
}
}
}
}

var a = CopyOnWrite(value: SomeLargeObject())

// This line doesn't copy anything.
var b = a

However, there is an import caveat here!
Reading the documentation for isKnownUniquelyReferenced we get this warning:

If the instance passed as object is being accessed by multiple threads simultaneously,
this function may still return true.
Therefore, you must only call this function from mutating methods
with appropriate thread synchronization.

This means the implementation presented above isn't thread safe,
as we may encounter situations where it'd wrongly assumes the wrapped object can be safely mutated,
while in fact such mutation would break invariant in another thread.
Yet this doesn't mean Swift's copy-on-write is inherently flawed in multithreaded programs.
The key is to understand what "accessed by multiple threads simultaneously" really means.
In our example, this would happen if the same instance of CopyOnWrite was shared across multiple threads,
for instance as part of a shared global variable.
The wrapped object would then have a thread safe copy-on-write semantics,
but the instance holding it would be subject to data race.
The reason is that Swift must establish unique ownership
to properly evaluate isKnownUniquelyReferenced [4],
which it can't do if the owner of the instance is itself shared across multiple threads.

Value types and multithreading

It is Swift's intention to alleviate the burden of the programmer
when dealing with multithreaded environments, as stated on Apple's blog
(https://developer.apple.com/swift/blog/?id=10):

One of the primary reasons to choose value types over reference types
is the ability to more easily reason about your code.
If you always get a unique, copied instance,
you can trust that no other part of your app is changing the data under the covers.
This is especially helpful in multi-threaded environments
where a different thread could alter your data out from under you.
This can create nasty bugs that are extremely hard to debug.

Ultimately, the copy-on-write mechanism is a resource management optimization that,
like any other optimization technique,
one shouldn't think about when writing code [5].
Instead, one should think in more abstract terms
and consider values to be effectively copied when assigned or passed as arguments.


[1]
This holds only for values used as local variables.
Values used as fields of a reference type (e.g. a class) are also stored in the heap.

[2]
One could get confirmation of that by checking the LLVM byte code that's produced
when dealing with value types rather than reference types,
but the Swift compiler being very eager to perform constant propagation,
building a minimal example is a bit tricky.

[3]
Swift doesn't allow structures to reference themselves,
as the compiler would be unable to compute the size of such type statically.
Therefore, it is not very realistic to think of a structure that is so large
that copying it would become a legitimate concern.

[4]
This is, by the way, the reason why isKnownUniquelyReferenced accepts an inout parameter,
as it's currently Swift's way to establish ownership.

[5]
Although passing copies of value-type instances should be safe,
there's a open issue that suggests some problems with the current implementation
(https://bugs.swift.org/browse/SR-6543).

Which value types in Swift supports copy-on-write?

Copy-on write is supported for String and all collection types - Array, Dictionary and Set.

Besides that, compiler is free to optimize any struct access and effectively give you copy-on-write semantics, but it is not guaranteed.

How does an array in swift deep copy itself when copied or assigned

Assignment of any struct (such as Array) causes a shallow copy of the structure contents. There's no special behavior for Array. The buffer that stores the Array's elements is not actually part of the structure. A pointer to that buffer, stored on the heap, is part of the Array structure, meaning that upon assignment, the buffer pointer is copied, but it still points to the same buffer.

All mutating operations on Array do a check to see if the buffer is uniquely referenced. If so, then the algorithm proceeds. Otherwise, a copy of the buffer is made, and the pointer to the new buffer is saved to that Array instance, then the algorithm proceeds as previously. This is called Copy on Write (CoW). Notice that it's not an automatic feature of all value types. It is merely a manually implemented feature of a few standard library types (like Array, Set, Dictionary, String, and others). You could even implement it yourself for your own types.

When CoW occurs, it does not do any deep copying. It will copy values, which means:

  • In the case of value types (struct, enum, tuples), the values are the struct/enum/tuples themselves. In this case, a deep and shallow copy are the same thing.
  • In the case of reference types (class), the value being copied is the reference. The referenced object is not copied. The same object is pointed to by both the old and copied reference. Thus, it's a shallow copy.

How a value typed variable is copied when it is passed to a function, what hold this copy?

When passing value type to a function, think of it like assigning it to a local variable whose scope is that function, so the copy behavior is analogous to just assigning a new local variable.

Regarding where it is copied, we should recognize that the copy behavior is actually more complicated than it sounds. As they point out in Building Better Apps with Value Types in Swift (WWDC 2015, Session 414), "Copies are Cheap":

Copying a low-level, fundamental type is constant time

  • Int, Double, etc.

Copying a struct, enum, or tuple of value types is constant time

  • CGPoint, etc.

Extensible data structures use copy-on-write

  • Copying involves a fixed number of reference-counting operations

  • String, Array, Set, Dictionary, etc.

Regarding that last point, behind the scenes Swift does some sleight of hand that avoids copying extensible value types every time they're referenced, but rather just points to the original reference but keeps track of how many references there are and actually only makes copies (a) upon write; where (b) there's more than one reference. This behavior is discussed in more detail in that video.

Swift value type did not copy on first assignment?

It won't cause a copy operation. Since this is a struct there is no need for a malloc (or it's Swift equivalent swift_allocObject). It will just allocate an Int size from the stack by changing stack pointer.

See the godbolt here

If arrays are value types and therefore get copied, then how are they not thread safe?

The fundamental issue is the interpretation of "every thread gets its own copy".

Yes, we often use value types to ensure thread safety by providing every thread its own copy of an object (such as an array). But that is not the same thing as claiming that value types guarantee every thread will get its own copy.

Specifically, using closures, multiple threads can attempt to mutate the same value-type object. Here is an example of code that shows some non-thread-safe code interacting with a Swift Array value type:

let queue = DispatchQueue.global()

var employees = ["Bill", "Bob", "Joe"]

queue.async {
let count = employees.count
for index in 0 ..< count {
print("\(employees[index])")
Thread.sleep(forTimeInterval: 1)
}
}

queue.async {
Thread.sleep(forTimeInterval: 0.5)
employees.remove(at: 0)
}

(You generally wouldn't add sleep calls; I only added them to manifest race conditions that are otherwise hard to reproduce. You also shouldn't mutate an object from multiple threads like this without some synchronization, but I'm doing this to illustrate the problem.)

In these async calls, you're still referring to the same employees array defined earlier. So, in this particular example, we'll see it output "Bill", it will skip "Bob" (even though it was "Bill" that was removed), it will output "Joe" (now the second item), and then it will crash trying to access the third item in an array that now only has two items left.

Now, all that I illustrate above is that a single value type can be mutated by one thread while being used by another, thereby violating thread-safety. There are actually a whole series of more fundamental problems that can manifest themselves when writing code that is not thread-safe, but the above is merely one slightly contrived example.

But, you can ensure that this separate thread gets its own copy of the employees array by adding a "capture list" to that first async call to indicate that you want to work with a copy of the original employees array:

queue.async { [employees] in
...
}

Or, you'll automatically get this behavior if you pass this value type as a parameter to another method:

doSomethingAsynchronous(with: employees) { result in
...
}

In either of these two cases, you'll be enjoying value semantics and see a copy (or copy-on-write) of the original array, although the original array may have been mutated elsewhere.

Bottom line, my point is merely that value types do not guarantee that every thread has its own copy. The Array type is not (nor are many other mutable value types) thread-safe. But, like all value types, Swift offer simple mechanisms (some of them completely automatic and transparent) that will provide each thread its own copy, making it much easier to write thread-safe code.


Here's another example with another value type that makes the problem more obvious. Here's an example where a failure to write thread-safe code returns semantically invalid object:

let queue = DispatchQueue.global()

struct Person {
var firstName: String
var lastName: String
}

var person = Person(firstName: "Rob", lastName: "Ryan")

queue.async {
Thread.sleep(forTimeInterval: 0.5)
print("1: \(person)")
}

queue.async {
person.firstName = "Rachel"
Thread.sleep(forTimeInterval: 1)
person.lastName = "Moore"
print("2: \(person)")
}

In this example, the first print statement will say, effectively "Rachel Ryan", which is neither "Rob Ryan" nor "Rachel Moore". In short, we're examining our Person while it is in an internally inconsistent state.

But, again, we can use a capture list to enjoy value semantics:

queue.async { [person] in
Thread.sleep(forTimeInterval: 0.5)
print("1: \(person)")
}

And in this case, it will say "Rob Ryan", oblivious to the fact that the original Person may be in the process of being mutated by another thread. (Clearly, the real problem is not fixed just by using value semantics in the first async call, but synchronizing the second async call and/or using value semantics there, too.)

If var seems to deep copy arrays in Swift. Does if let?

Relevant: https://developer.apple.com/swift/blog/?id=10.

In Swift, Array, String, and Dictionary are all value types.

So, if you assign an existing value type via var or let then a copy occurs. If you assign an existing reference type (such as a class) via var or let then you'll be assigning a reference.

Does swift copy on write for all structs?

Array is implemented with copy-on-write behaviour – you'll get it regardless of any compiler optimisations (although of course, optimisations can decrease the number of cases where a copy needs to happen).

At a basic level, Array is just a structure that holds a reference to a heap-allocated buffer containing the elements – therefore multiple Array instances can reference the same buffer. When you come to mutate a given array instance, the implementation will check if the buffer is uniquely referenced, and if so, mutate it directly. Otherwise, the array will perform a copy of the underlying buffer in order to preserve value semantics.

However, with your Point structure – you're not implementing copy-on-write at a language level. Of course, as @Alexander says, this doesn't stop the compiler from performing all sorts of optimisations to minimise the cost of copying whole structures about. These optimisations needn't follow the exact behaviour of copy-on-write though – the compiler is simply free to do whatever it wishes, as long as the program runs according to the language specification.

In your specific example, both p1 and p2 are global, therefore the compiler needs to make them distinct instances, as other .swift files in the same module have access to them (although this could potentially be optimised away with whole-module optimisation). However, the compiler still doesn't need to copy the instances – it can just evaluate the floating-point addition at compile-time and initialise one of the globals with 0.0, and the other with 1.0.

And if they were local variables in a function, for example:

struct Point {
var x: Float = 0
}

func foo() {
var p1 = Point()
var p2 = p1
p2.x += 1
print(p2.x)
}

foo()

The compiler doesn't even have to create two Point instances to begin with – it can just create a single floating-point local variable initialised to 1.0, and print that.

Regarding passing value types as function arguments, for large enough types and (in the case of structures) functions that utilise enough of their properties, the compiler can pass them by reference rather than copying. The callee can then make a copy of them only if needed, such as when needing to work with a mutable copy.

In other cases where structures are passed by value, it's also possible for the compiler to specialise functions in order to only copy across the properties that the function needs.

For the following code:

struct Point {
var x: Float = 0
var y: Float = 1
}

func foo(p: Point) {
print(p.x)
}

var p1 = Point()
foo(p: p1)

Assuming foo(p:) isn't inlined by the compiler (it will in this example, but once its implementation reaches a certain size, the compiler won't think it worth it) – the compiler can specialise the function as:

func foo(px: Float) {
print(px)
}

foo(px: 0)

It only passes the value of Point's x property into the function, thereby saving the cost of copying the y property.

So the compiler will do whatever it can in order to reduce the copying of value types. But with so many various optimisations in different circumstances, you cannot simply boil the optimised behaviour of arbitrary value types down to just copy-on-write.

Assignment always make copying in swift

An array is a struct, and structs are value types, so they are copied by value and not by reference.
The same happens for dictionaries, a copy is created if you assign to another variable.

Classes instead are reference types, and assignment copies the reference to the instance.

You can read more about that in Structures and Enumerations Are Value Types

Sidenote: a struct passed to a function is immutable - you cannot modify it within the function, unless you pass it by reference using the inout attribute



Related Topics



Leave a reply



Submit