Apple's Description of Reference and Value Types with Multiple Threads

Apple's description of reference and value types with multiple threads

As others have pointed out, reference types always pass a pointer to the object, which is ideal where you want a "shared, mutable state" (as that document you referenced said). Clearly, though, if you're mutating/accessing a reference type across multiple threads, make sure to synchronize your access to it (via a dedicated serial queue, the reader-writer pattern, locks, etc.).

Value types are a little more complicated, though. Yes, as the others have pointed out, if you pass a value type as a parameter to a method that then does something on another thread, you're essentially working with a copy of that value type (Josh's note regarding the copy-on-write, notwithstanding). This ensures the integrity of that object passed to the method. That's fine (and has been sufficiently covered by the other answers here).

But it gets more complicated when you are dealing with closures. Consider, for example, the following:

struct Person {
var firstName: String
var lastName: String
}

var person = Person(firstName: "Rob", lastName: "Ryan")

DispatchQueue.global().async {
Thread.sleep(forTimeInterval: 1)
print("1: \(person)")
}

person.firstName = "Rachel"
Thread.sleep(forTimeInterval: 2)
person.lastName = "Moore"
print("2: \(person)")

Obviously, you wouldn't generally sleep, but I'm doing this to illustrate the point: Namely, even though we're dealing with a value type and multiple threads, the person you reference in the closure is the same instance as you're dealing with on the main thread (or whatever thread this was running on), not a copy of it. If you're dealing with a mutable object, that's not thread-safe.

I've contrived this example to illustrate this point, where the print statement inside the closure above will report "Rachel Ryan", effectively showing the state of the Person value type in an inconsistent state.

With closures using value types, if you want to enjoy value semantics, you have to change that async call to use a separate variable:

let separatePerson = person
queue.async {
Thread.sleep(forTimeInterval: 1)
print("1: \(separatePerson)")
}

Or, even easier, use a "capture list", which indicates what value type variables should be captured by the closure:

queue.async { [person] in
Thread.sleep(forTimeInterval: 1)
print("1: \(person)")
}

With either of these examples, you're now enjoying value semantics, copying the object, and the print statement will correctly report "Rob Ryan" even though the original person object is being mutated on another thread.

So, if you are dealing with value types and closures, value types can be shared across threads unless you explicitly use capture list (or something equivalent) in order to enjoy value semantics (i.e. copying the object as needed).

Is value type really safe in multple threads?

A couple of observations:

  1. That blog entry says:

    Importantly, you can safely pass copies of values across threads without synchronization.

    The operative word here is “copies”.

    But in your example, you’re not passing copies of a value-type object to the different threads. You’re sharing single instance of reference-type object, a class, between the threads. Sure, your reference-type has a value-type property, but that doesn’t alter the fact that you’re sharing a single instance of your reference-type object instance across threads. You will have to manually synchronize your interaction with that object and its properties in order to enjoy thread-safety.

  2. There’s an argument to be made that many discussions mislead readers into thinking that Swift value-types always enjoy copy (or copy-on-write) semantics, and therefore always enjoy this thread safety feature. But you have to be careful, because there are several examples where you don’t get copy semantics. Your example of having a value-type property within a reference-type object is one example.

    Another example is when you fail to use closure “capture lists”. For example, the following is not thread-safe as it is using the same value-type instance across multiple threads:

    var object = StructA(a: 42, b: "foo")
    DispatchQueue.global().async {
    print(object)
    }
    object.b = "bar"

    But by adding the capture list, the global queue will have its own copy of the object, restoring this thread-safety interaction across threads because each thread has its own copy of the object in question:

    var object = StructA(a: 42, b: "foo")
    DispatchQueue.global().async { [object] in
    print(object)
    }
    object.b = "bar"
  3. Yes, you can write thread-safe code if you (a) use value types; and (b) pass copies of these value types around. But this has nothing to do with atomicity. Bottom line, Swift variables are not atomic.

If arrays are value types and therefore get copied, then how are they not thread safe?

The fundamental issue is the interpretation of "every thread gets its own copy".

Yes, we often use value types to ensure thread safety by providing every thread its own copy of an object (such as an array). But that is not the same thing as claiming that value types guarantee every thread will get its own copy.

Specifically, using closures, multiple threads can attempt to mutate the same value-type object. Here is an example of code that shows some non-thread-safe code interacting with a Swift Array value type:

let queue = DispatchQueue.global()

var employees = ["Bill", "Bob", "Joe"]

queue.async {
let count = employees.count
for index in 0 ..< count {
print("\(employees[index])")
Thread.sleep(forTimeInterval: 1)
}
}

queue.async {
Thread.sleep(forTimeInterval: 0.5)
employees.remove(at: 0)
}

(You generally wouldn't add sleep calls; I only added them to manifest race conditions that are otherwise hard to reproduce. You also shouldn't mutate an object from multiple threads like this without some synchronization, but I'm doing this to illustrate the problem.)

In these async calls, you're still referring to the same employees array defined earlier. So, in this particular example, we'll see it output "Bill", it will skip "Bob" (even though it was "Bill" that was removed), it will output "Joe" (now the second item), and then it will crash trying to access the third item in an array that now only has two items left.

Now, all that I illustrate above is that a single value type can be mutated by one thread while being used by another, thereby violating thread-safety. There are actually a whole series of more fundamental problems that can manifest themselves when writing code that is not thread-safe, but the above is merely one slightly contrived example.

But, you can ensure that this separate thread gets its own copy of the employees array by adding a "capture list" to that first async call to indicate that you want to work with a copy of the original employees array:

queue.async { [employees] in
...
}

Or, you'll automatically get this behavior if you pass this value type as a parameter to another method:

doSomethingAsynchronous(with: employees) { result in
...
}

In either of these two cases, you'll be enjoying value semantics and see a copy (or copy-on-write) of the original array, although the original array may have been mutated elsewhere.

Bottom line, my point is merely that value types do not guarantee that every thread has its own copy. The Array type is not (nor are many other mutable value types) thread-safe. But, like all value types, Swift offer simple mechanisms (some of them completely automatic and transparent) that will provide each thread its own copy, making it much easier to write thread-safe code.


Here's another example with another value type that makes the problem more obvious. Here's an example where a failure to write thread-safe code returns semantically invalid object:

let queue = DispatchQueue.global()

struct Person {
var firstName: String
var lastName: String
}

var person = Person(firstName: "Rob", lastName: "Ryan")

queue.async {
Thread.sleep(forTimeInterval: 0.5)
print("1: \(person)")
}

queue.async {
person.firstName = "Rachel"
Thread.sleep(forTimeInterval: 1)
person.lastName = "Moore"
print("2: \(person)")
}

In this example, the first print statement will say, effectively "Rachel Ryan", which is neither "Rob Ryan" nor "Rachel Moore". In short, we're examining our Person while it is in an internally inconsistent state.

But, again, we can use a capture list to enjoy value semantics:

queue.async { [person] in
Thread.sleep(forTimeInterval: 0.5)
print("1: \(person)")
}

And in this case, it will say "Rob Ryan", oblivious to the fact that the original Person may be in the process of being mutated by another thread. (Clearly, the real problem is not fixed just by using value semantics in the first async call, but synchronizing the second async call and/or using value semantics there, too.)

Problems with running multiple threads

When you call the .run() method on a Runnable directly within another thread, you simply add that "thread" to the same stack (i.e. it runs as a single thread).

You should instead wrap the Runnable in a new thread and use .start() to execute the thread.

Apple apple = new Apple(ingredient);
Thread t = new Thread(apple);
t.start();


Strawberry strawberry = new Strawberry(ingredient);
Thread t2 = new Thread(strawberry);
t2.start();

You're still calling the run() method directly. Instead, you have to call the start() method, which calls run() indirectly in a new thread. See edit.

When does the copying take place for swift value types

TL;DR:

So does it mean that the copying actually only takes placed when the passed value type is modified?

Yes!

Is there a way to demonstrate that this is actually the underlying behavior?

See the first example in the section on the copy-on-write optimization.

Should I just use NSArrray in this case or would the Swift Array work fine
as long as I do not try to manipulate the passed in Array?

If you pass your array as inout, then you'll have a pass-by-reference semantics,
hence obviously avoiding unnecessary copies.
If you pass your array as a normal parameter,
then the copy-on-write optimization will kick in and you shouldn't notice any performance drop
while still benefiting from more type safety that what you'd get with a NSArray.

Now as long as I do not explicitly make the variables in the function editable
by using var or inout, then the function can not modify the array anyway.
So does it still make a copy?

You will get a "copy", in the abstract sense.
In reality, the underlying storage will be shared, thanks to the copy-on-write mechanism,
hence avoiding unnecessary copies.

If the original array is immutable and the function is not using var or inout,
there is no point in Swift creating a copy. Right?

Exactly, hence the copy-on-write mechanism.

So what does Apple mean by the phrase above?

Essentially, Apple means that you shouldn't worry about the "cost" of copying value types,
as Swift optimizes it for you behind the scene.

Instead, you should just think about the semantics of value types,
which is that get a copy as soon as you assign or use them as parameters.
What's actually generated by Swift's compiler is the Swift's compiler business.

Value types semantics

Swift does indeed treat arrays as value types (as opposed to reference types),
along with structures, enumerations and most other built-in types
(i.e. those that are part of the standard library and not Foundation).
At the memory level, these types are actually immutable plain old data objects (POD),
which enables interesting optimizations.
Indeed, they are typically allocated on the stack rather than the heap [1],
(https://en.wikipedia.org/wiki/Stack-based_memory_allocation).
This allows the CPU to very efficiently manage them,
and to automatically deallocate their memory as soon as the function exits [2],
without the need for any garbage collection strategy.

Values are copied whenever assigned or passed as a function.
This semantics has various advantages,
such as avoiding the creation of unintended aliases,
but also as making it easier for the compiler to guarantee the lifetime of values
stored in a another object or captured by a closure.
We can think about how hard it can be to manage good old C pointers to understand why.

One may think it's an ill-conceived strategy,
as it involves copying every single time a variable is assigned or a function is called.
But as counterintuitive it may be,
copying small types is usually quite cheap if not cheaper than passing a reference.
After all, a pointer is usually the same size as an integer...

Concerns are however legitimate for large collections (i.e. arrays, sets and dictionaries),
and very large structures to a lesser extent [3].
But the compiler has has a trick to handle these, namely copy-on-write (see later).

What about mutating

Structures can define mutating methods,
which are allowed to mutate the fields of the structure.
This doesn't contradict the fact that value types are nothing more than immutable PODs,
as in fact calling a mutating method is merely a huge syntactic sugar
for reassigning a variable to a brand new value that's identical to the previous ones,
except for the fields that were mutated.
The following example illustrates this semantical equivalence:

struct S {
var foo: Int
var bar: Int
mutating func modify() {
foo = bar
}
}

var s1 = S(foo: 0, bar: 10)
s1.modify()

// The two lines above do the same as the two lines below:
var s2 = S(foo: 0, bar: 10)
s2 = S(foo: s2.bar, bar: s2.bar)

Reference types semantics

Unlike value types, reference types are essentially pointers to the heap at the memory level.
Their semantics is closer to what we would get in reference-based languages,
such as Java, Python or Javascript.
This means they do not get copied when assigned or passed to a function, their address is.
Because the CPU is no longer able to manage the memory of these objects automatically,
Swift uses a reference counter to handle garbage collection behind the scenes
(https://en.wikipedia.org/wiki/Reference_counting).

Such semantics has the obvious advantage to avoid copies,
as everything is assigned or passed by reference.
The drawback is the danger of unintended aliases,
as in almost any other reference-based language.

What about inout

An inout parameter is nothing more than a read-write pointer to the expected type.
In the case of value types, it means the function won't get a copy of the value,
but a pointer to such values,
so mutations inside the function will affect the value parameter (hence the inout keyword).
In other terms, this gives value types parameters a reference semantics in the context of the function:

func f(x: inout [Int]) {
x.append(12)
}

var a = [0]
f(x: &a)

// Prints '[0, 12]'
print(a)

In the case of reference types, it will make the reference itself mutable,
pretty much as if the passed argument was a the address of the address of the object:

func f(x: inout NSArray) {
x = [12]
}

var a: NSArray = [0]
f(x: &a)

// Prints '(12)'
print(a)

Copy-on-write

Copy-on-write (https://en.wikipedia.org/wiki/Copy-on-write) is an optimization technique that
can avoid unnecessary copies of mutable variables,
which is implemented on all Swift's built-in collections (i.e. array, sets and dictionaries).
When you assign an array (or pass it to a function),
Swift doesn't make a copy of the said array and actually uses a reference instead.
The copy will take place as soon as the your second array is mutated.
This behavior can be demonstrated with the following snippet (Swift 4.1):

let array1 = [1, 2, 3]
var array2 = array1

// Will print the same address twice.
array1.withUnsafeBytes { print($0.baseAddress!) }
array2.withUnsafeBytes { print($0.baseAddress!) }

array2[0] = 1

// Will print a different address.
array2.withUnsafeBytes { print($0.baseAddress!) }

Indeed, array2 doesn't get a copy of array1 immediately,
as shown by the fact it points to the same address.
Instead, the copy is triggered by the mutation of array2.

This optimization also happens deeper in the structure,
meaning that if for instance your collection is made of other collections,
the latter will also benefit from the copy-on-write mechanism,
as demonstrated by the following snippet (Swift 4.1):

var array1 = [[1, 2], [3, 4]]
var array2 = array1

// Will print the same address twice.
array1[1].withUnsafeBytes { print($0.baseAddress!) }
array2[1].withUnsafeBytes { print($0.baseAddress!) }

array2[0] = []

// Will print the same address as before.
array2[1].withUnsafeBytes { print($0.baseAddress!) }

Replicating copy-on-write

It is in fact rather easy to implement the copy-on-write mechanism in Swift,
as some of the its reference counter API is exposed to the user.
The trick consists of wrapping a reference (e.g. a class instance) within a structure,
and to check whether that reference is uniquely referenced before mutating it.
When that's the case, the wrapped value can be safely mutated,
otherwise it should be copied:

final class Wrapped<T> {
init(value: T) { self.value = value }
var value: T
}

struct CopyOnWrite<T> {
init(value: T) { self.wrapped = Wrapped(value: value) }
var wrapped: Wrapped<T>
var value: T {
get { return wrapped.value }
set {
if isKnownUniquelyReferenced(&wrapped) {
wrapped.value = newValue
} else {
wrapped = Wrapped(value: newValue)
}
}
}
}

var a = CopyOnWrite(value: SomeLargeObject())

// This line doesn't copy anything.
var b = a

However, there is an import caveat here!
Reading the documentation for isKnownUniquelyReferenced we get this warning:

If the instance passed as object is being accessed by multiple threads simultaneously,
this function may still return true.
Therefore, you must only call this function from mutating methods
with appropriate thread synchronization.

This means the implementation presented above isn't thread safe,
as we may encounter situations where it'd wrongly assumes the wrapped object can be safely mutated,
while in fact such mutation would break invariant in another thread.
Yet this doesn't mean Swift's copy-on-write is inherently flawed in multithreaded programs.
The key is to understand what "accessed by multiple threads simultaneously" really means.
In our example, this would happen if the same instance of CopyOnWrite was shared across multiple threads,
for instance as part of a shared global variable.
The wrapped object would then have a thread safe copy-on-write semantics,
but the instance holding it would be subject to data race.
The reason is that Swift must establish unique ownership
to properly evaluate isKnownUniquelyReferenced [4],
which it can't do if the owner of the instance is itself shared across multiple threads.

Value types and multithreading

It is Swift's intention to alleviate the burden of the programmer
when dealing with multithreaded environments, as stated on Apple's blog
(https://developer.apple.com/swift/blog/?id=10):

One of the primary reasons to choose value types over reference types
is the ability to more easily reason about your code.
If you always get a unique, copied instance,
you can trust that no other part of your app is changing the data under the covers.
This is especially helpful in multi-threaded environments
where a different thread could alter your data out from under you.
This can create nasty bugs that are extremely hard to debug.

Ultimately, the copy-on-write mechanism is a resource management optimization that,
like any other optimization technique,
one shouldn't think about when writing code [5].
Instead, one should think in more abstract terms
and consider values to be effectively copied when assigned or passed as arguments.


[1]
This holds only for values used as local variables.
Values used as fields of a reference type (e.g. a class) are also stored in the heap.

[2]
One could get confirmation of that by checking the LLVM byte code that's produced
when dealing with value types rather than reference types,
but the Swift compiler being very eager to perform constant propagation,
building a minimal example is a bit tricky.

[3]
Swift doesn't allow structures to reference themselves,
as the compiler would be unable to compute the size of such type statically.
Therefore, it is not very realistic to think of a structure that is so large
that copying it would become a legitimate concern.

[4]
This is, by the way, the reason why isKnownUniquelyReferenced accepts an inout parameter,
as it's currently Swift's way to establish ownership.

[5]
Although passing copies of value-type instances should be safe,
there's a open issue that suggests some problems with the current implementation
(https://bugs.swift.org/browse/SR-6543).

Question with Apple's Cocoa Documentation

What exactly does it mean when it shows +monthArray? I know that the + means that it is a class method, but I don't entirely understand what implications that has. How exactly does it make it different than a normal method?

Class methods are functions that can be called without providing an instance of the class. In Objective-C, you normally send messages to an instance as follows: [someObject someMethod]; With a class method, you send the message to the class itself: [SomeClass someClassMethod];. Class methods are usually used to do things that are specific to all objects of that particular class, such as creating a new instance (i.e. a factory method) or maintaining/updating a global class variable.

What is the purpose in this code for the whole sharedMonthArray ivar? I mean, the objectAtIndex: method is pulling from the months' strings, so what's the point?

The purpose of sharedMonthArray is to hold a single, common instance of the class. It would be possible to return a new instance every time +monthArray is called, but that could be a waste of memory, since there really only needs to be one. Most Cocoa code uses NSArray objects, and not simple C-style arrays, so this single instance is like a NSArray wrapper to hide that static C-array from the rest of the program by providing a NSArray instance instead. Internally, the C-style array is used because there is no way to create a static constant NSArray object at compile-time.

What is the "self" in the [self count] method? I mean, I understand that it is supposed to be the months, but where do I see that in the program? What makes [self count] count the months and not count the number of MonthArrays?

self is an automatic variable that the Objective-C runtime uses to hold the object which receives a certain message. If you send the message [someObject someMethod];, this will jump to the -someMethod method, and the value of self will be equal to someObject. In this example, self will be whichever instance of MonthArray was sent the -objectAtIndex: message. The call to [self count] will result in the method -count being called, and will return a value of 12, according to the code shown.

I would like to add that the code shown here is fairly unique in the Cocoa world. The only time you would really write something like this is if you were creating your own framework. The majority of Cocoa source code will not use C-style arrays. What was the purpose of this particular example? It is definitely not the kind of code a Cocoa developer will write on a daily basis.

What does Apple mean when they say that a NSManagedObjectContext is owned by the thread or queue that created it?

The NSManagedObjectContext and any managed objects associated with it should be pinned to a single actor (thread, serialized queue, NSOperationQueue with max concurrency = 1).

This pattern is called thread confinement or isolation. There isn't a great phrase for (thread || serialized queue || NSOperationQueue with max concurrency = 1) so the documentation goes on to say "we'll just use 'thread' for the remainder of the Core Data doc when we mean any of those 3 ways of getting a serialized control flow"

If you create a MOC on one thread, and then use it on another, you have violated thread confinement by exposing the MOC object reference to two threads. Simple. Don't do it. Don't cross the streams.

We call out NSOperation explicitly because unlike threads & GCD, it has this odd issue where -init runs on the thread creating the NSOperation but -main runs on the thread running the NSOperation. It makes sense if you squint at it right, but it is not intuitive. If you create your MOC in -[NSOperation init], then NSOperation will helpfully violate thread confinement before your -main method even runs and you're hosed.

We actively discourage / deprecated using MOCs and threads in any other ways. While theoretically possible to do what bbum mentions, no one ever got that right. Everybody tripped up, forgot a necessary call to -lock in 1 place, "init runs where ?", or otherwise out-clevered themselves. With autorelease pools and the application event loop and the undo manager and cocoa bindings and KVO there are just so many ways for one thread to hold on to a reference to a MOC after you've tried to pass it elsewhere. It is far more difficult than even advanced Cocoa developers imagine until they start debugging. So that's not a very useful API.

The documentation changed to clarify and emphasize the thread confinement pattern as the only sane way to go. You should consider trying to be extra fancy using -lock and -unlock on NSManagedObjectContext to be (a) impossible and (b) de facto deprecated. It's not literally deprecated because the code works as well as it ever did. But your code using it is wrong.

Some people created MOCs on 1 thread, and passed them to another without calling -lock. That was never legal. The thread that created the MOC has always been the default owner of the MOC. This became a more frequent issue for MOCs created on the main thread. Main thread MOCs interact with the application's main event loop for undo, memory management, and some other reasons. On 10.6 and iOS 3, MOCs take more aggressive advantage of being owned by the main thread.

Although queues are not bound to specific threads, if you create a MOC within the context of a queue the right things will happen. Your obligation is to follow the public API.

If the queue is serialized, you may share the MOC with succeeding blocks that run on that queue.

So do not expose an NSManagedObjectContext* to more than one thread (actor, etc) under any circumstance. There is one ambiguity. You may pass the NSNotification* from the didSave notification to another thread's MOC's -mergeChangesFromContextDidSaveNotification: method.

  • Ben

Which is thread Safe atomic or non-atomic?

As is mentioned in several answers to the posted question, atomic is thread safe. This means that getter/setter working on any thread should finish first, before any other thread can perform getter/setter.



Related Topics



Leave a reply



Submit