How to Prove "Copy-On-Write" on String Type in Swift

How to prove copy-on-write on String type in Swift

The compiler creates only a single storage for both
"Hello World" and "Hello" + " World".

You can verify that for example by examining the assembly code
obtained from


swiftc -emit-assembly cow.swift

which defines only a single string literal


.section __TEXT,__cstring,cstring_literals
L___unnamed_1:
.asciz "Hello World"

As soon as the string is mutated, the address of the string storage
buffer (the first member of that "magic" tuple, actually _baseAddress
of struct _StringCore, defined in StringCore.swift) changes:

var xString = "Hello World"
var yString = "Hello" + " World"

print(_rawIdentifier(s: xString)) // (4300325536, 0)
print(_rawIdentifier(s: yString)) // (4300325536, 0)

yString.append("!")
print(_rawIdentifier(s: yString)) // (4322384560, 4322384528)

And why does your

func address(of object: UnsafeRawPointer) -> String

function show the same values for xArray and yArray, but
not for xString and yString?

Passing an array to a function taking a unsafe pointer passes the
address of the first array element, that is the same for both
arrays if they share the storage.

Passing a string to a function taking an unsafe pointer passes a
pointer to a temporary UTF-8 representation of the string.
That address can be different in each call, even for the same string.

This behavior is documented in the "Using Swift with Cocoa and
Objective-C" reference for UnsafePointer<T> arguments, but apparently
works the same for UnsafeRawPointer arguments.

Which value types in Swift supports copy-on-write?

Copy-on write is supported for String and all collection types - Array, Dictionary and Set.

Besides that, compiler is free to optimize any struct access and effectively give you copy-on-write semantics, but it is not guaranteed.

Test Copy-on-write in swift

Your code is printing the addresses of the array buffers (Array is a special case when passing a value to a pointer parameter). However, in Swift 3, the compiler assumed that the presence of the & operator meant that the buffer was being passed as mutable memory, so (unnecessarily) made it unique (by copying) before passing its pointer value, despite that pointer value being passed as an UnsafeRawPointer. That's why you see different addresses.

If you remove the & operator and pass the arrays directly:

func address(_ p: UnsafeRawPointer) {
print(p)
}

var originArray = [1, 2, 3]
var firstArray = originArray

address(originArray) // 0x00000000016e71c0
address(firstArray) // 0x00000000016e71c0

You'll now get the same addresses, as the compiler now assumes that address(_:) will not modify the memory of the buffers passed, as they're being passed to an UnsafeRawPointer parameter.

In Swift 4, this inconsistency is fixed, and the compiler no longer makes the buffer unique before passing its pointer values to an UnsafeRawPointer parameter, even when using the & operator, so your code exhibits expected behaviour.

Although, it's worth noting that the above method isn't guaranteed to produce stable pointer values when passing the same array to multiple pointer parameters.

From the Swift blog post "Interacting with C Pointers":

Even if you pass the same variable, array, or string as multiple pointer arguments, you could receive a different pointer each time.

I believe this guarantee cannot be met for arrays in two cases (there may be more):

  1. If the array is viewing elements in non-contiguous storage

    Swift's Array can view elements in non-contiguous storage, for example when it is wrapping an NSArray. In such a case, when passing it to a pointer parameter, a new contiguous buffer will have to be created, therefore giving you a different pointer value.

  2. If the buffer is non-uniquely referenced when passed as mutable memory

    As mentioned earlier, when passing an array to a mutable pointer parameter, its buffer will first be made unique in order to preserve value semantics, as it's assumed the function will perform a mutation of the buffer.

    Therefore, if the buffer was copied, you'll get a different pointer value to if you had passed the array to an immutable pointer parameter.

Although neither of these two points are applicable in the example you give, it's worth bearing in mind that the compiler still doesn't guarantee you stable pointer values to the array's buffer when passing to pointer parameters.

For results that are guaranteed to be reliable, you should use the withUnsafeBytes(_:) method on a ContiguousArray:

var originArray: ContiguousArray = [1, 2, 3]
var firstArray = originArray

originArray.withUnsafeBytes { print($0.baseAddress!) } // 0x0000000102829550
firstArray.withUnsafeBytes { print($0.baseAddress!) } // 0x0000000102829550

This is because withUnsafeBytes(_:) is documented as accepting:

A closure with an UnsafeRawBufferPointer parameter that points to the contiguous storage for the array. If no such storage exists, it is created.

And ContiguousArray guarantees that:

[it] always stores its elements in a contiguous region of memory

And just like Array, ContiguousArray uses copy-on-write in order to have value semantics, so you can still use it to check when the array's buffer is copied upon a mutation taking place:

var originArray: ContiguousArray = [1, 2, 3]
var firstArray = originArray

originArray.withUnsafeBytes { print($0.baseAddress!) } // 0x0000000103103eb0
firstArray.withUnsafeBytes { print($0.baseAddress!) } // 0x0000000103103eb0

firstArray[0] = 4

originArray.withUnsafeBytes { print($0.baseAddress!) } // 0x0000000103103eb0
firstArray.withUnsafeBytes { print($0.baseAddress!) } // 0x0000000100e764d0

Does swift copy on write for all structs?

Array is implemented with copy-on-write behaviour – you'll get it regardless of any compiler optimisations (although of course, optimisations can decrease the number of cases where a copy needs to happen).

At a basic level, Array is just a structure that holds a reference to a heap-allocated buffer containing the elements – therefore multiple Array instances can reference the same buffer. When you come to mutate a given array instance, the implementation will check if the buffer is uniquely referenced, and if so, mutate it directly. Otherwise, the array will perform a copy of the underlying buffer in order to preserve value semantics.

However, with your Point structure – you're not implementing copy-on-write at a language level. Of course, as @Alexander says, this doesn't stop the compiler from performing all sorts of optimisations to minimise the cost of copying whole structures about. These optimisations needn't follow the exact behaviour of copy-on-write though – the compiler is simply free to do whatever it wishes, as long as the program runs according to the language specification.

In your specific example, both p1 and p2 are global, therefore the compiler needs to make them distinct instances, as other .swift files in the same module have access to them (although this could potentially be optimised away with whole-module optimisation). However, the compiler still doesn't need to copy the instances – it can just evaluate the floating-point addition at compile-time and initialise one of the globals with 0.0, and the other with 1.0.

And if they were local variables in a function, for example:

struct Point {
var x: Float = 0
}

func foo() {
var p1 = Point()
var p2 = p1
p2.x += 1
print(p2.x)
}

foo()

The compiler doesn't even have to create two Point instances to begin with – it can just create a single floating-point local variable initialised to 1.0, and print that.

Regarding passing value types as function arguments, for large enough types and (in the case of structures) functions that utilise enough of their properties, the compiler can pass them by reference rather than copying. The callee can then make a copy of them only if needed, such as when needing to work with a mutable copy.

In other cases where structures are passed by value, it's also possible for the compiler to specialise functions in order to only copy across the properties that the function needs.

For the following code:

struct Point {
var x: Float = 0
var y: Float = 1
}

func foo(p: Point) {
print(p.x)
}

var p1 = Point()
foo(p: p1)

Assuming foo(p:) isn't inlined by the compiler (it will in this example, but once its implementation reaches a certain size, the compiler won't think it worth it) – the compiler can specialise the function as:

func foo(px: Float) {
print(px)
}

foo(px: 0)

It only passes the value of Point's x property into the function, thereby saving the cost of copying the y property.

So the compiler will do whatever it can in order to reduce the copying of value types. But with so many various optimisations in different circumstances, you cannot simply boil the optimised behaviour of arbitrary value types down to just copy-on-write.

Anomaly in CoW (Copy on Write) of Swift Array with Reference type items

Looks like UnsafePointer(&value) returns the wrong value (maybe it is the head of the array or something like this). I changed the someFunc a little bit.

func someFunc(array: [TestClass]) {
var anotherArray = array
anotherArray.append(TestClass(name: "mnop"))

for var value in array {
debugPrint(Unmanaged.passUnretained(value).toOpaque())
}

for var value in anotherArray {
debugPrint(Unmanaged.passUnretained(value).toOpaque())
}

anotherArray[0].name = "Sandeep"
debugPrint(array[0].name)
}

And the output is the following:

0x0000600003f29360
0x0000600003f29380
0x0000600003f29400

0x0000600003f29360
0x0000600003f29380
0x0000600003f29400
0x0000600003f29340

As you can see, both arrays contain the same objects, and this is the expected behavior. Array stores references to TestClass objects (not values) and copies these references during CoW, but the objects remain the same.

Copy-on-Write in Multi-threaded environment in Swift

You're not looking at the addresses of the arrays. You're looking at the addresses of the internal backing storage of the arrays, which is shared and heap-allocated.

If you want to look at the addresses of the stack-allocated array container (the part that points to the backing storage), then you meant this:

var arr1 = [1, 2, 3, 4]
var arr2 = arr1
withUnsafePointer(to: &arr1) { print("arr1:", $0) }
withUnsafePointer(to: &arr2) { print("arr2:", $0) }

DispatchQueue.global(qos: .default).async {
let arr3 = arr1
withUnsafePointer(to: arr3) { print("arr3:", $0) }
}

// =>
arr1: 0x0000000122d671e0 // local stack
arr2: 0x0000000122d671e8 // local stack (next address)
arr3: 0x0000700000e48d10 // heap

I believe this is the kind of result you were expecting.

How can I make a container with copy-on-write semantics? (Swift)

A copy-on-write is usually a struct wrapper over some backing object.

public final class MutableHeapStore<T>: NonObjectiveCBase
{
public typealias Storage = T

public private(set) var storage: Storage

public init(storage: Storage)
{
self.storage = storage
}
}

public struct COW<T>
{
public typealias Storage = MutableHeapStore<T>
public typealias Value = T

public var storage: Storage

public init(storage: Storage)
{
self.storage = storage
}

public var value: Value
{
get
{
return storage.storage
}

set
{
if isUniquelyReferenced(&storage)
{
storage.storage = newValue
}

else
{
storage = Storage(storage: newValue)
}
}
}

public init(_ value: Value)
{
self.init(storage: Storage(storage: value))
}
}

extension COW: CustomStringConvertible
{
public var description: String
{
return String(value)
}
}

The trick lies in asserting isUniquelyReferenced every time the boxed value is mutated. If the underlying storage object is singly referenced, nothing is to be done. However if another reference exists, one must create a new storage.

Is this code thread-safe? It is exactly as safe as any other value type, e.g. Int or Bool.

Copy string value to another string?

String is a value type in Swift. So no, if you change one, the other won't change.

When you do

self.customUDID = self.temporaryUDID

A new string is created and stored in customUDID.

Example:

var s1 = "Foo"
var s2 = s1

s1 = "Bar"

print(s1,s2) //Prints "Bar Foo"


Related Topics



Leave a reply



Submit