How Safe Are Swift Collections When Used with Invalidated Iterators/Indices

How safe are swift collections when used with invalidated iterators / indices?

As you say, #1 is not an issue. You do not have a pointer to the object in Swift. You either have its value or a reference to it. If you have its value, then it's a copy. If you have a reference, then it's protected. So there's no issue here.

But let's consider the second and experiment, be surprised, and then stop being surprised.

var xs = [1,2,3,4]

for x in xs { // (1)
    if x == 2 {
        xs.removeAll() // (2)
    }
    print(x) // Prints "1\n2\n3\n\4\n"
}

xs // [] (3)

Wait, how does it print all the values when we blow away the values at (2). We are very surprised now.

But we shouldn't be. Swift arrays are values. The xs at (1) is a value. Nothing can ever change it. It's not "a pointer to memory that includes an array structure that contains 4 elements." It's the value [1,2,3,4]. At (2), we don't "remove all elements from the thing xs pointed to." We take the thing xs is, create an array that results if you remove all the elements (that would be [] in all cases), and then assign that new array to xs. Nothing bad happens.

So what does the documentation mean by "invalidates all indices?" It means exactly that. If we generated indices, they're no good anymore. Let's see:

var xs = [1,2,3,4]

for i in xs.indices {
    if i == 2 {
        xs.removeAll()
    }
    print(xs[i]) // Prints "1\n2\n" and then CRASH!!!
}

Once xs.removeAll() is called, there's no promise that the old result of xs.indices means anything anymore. You are not permitted to use those indices safely against the collection they came from.

"Invalidates indices" in Swift is not the same as C++'s "invalidates iterators." I'd call that pretty safe, except the fact that using collection indices is always a bit dangerous and so you should avoid indexing collections when you can help it; iterate them instead. Even if you need the indexes for some reason, use enumerate to get them without creating any of the danger of indexing.

(Side note, dict["key"] is not indexing into dict. Dictionaries are a little confusing because their key is not their index. Accessing dictionaries by their DictionaryIndex index is just as dangerous as accessing arrays by their Int index.)

Note also that the above doesn't apply to NSArray. If you modify NSArray while iterating it, you'll get a "mutated collection while iterating" error. I'm only discussing Swift data types.

EDIT: for-in is very explicit in how it works:

The generate() method is called on the collection expression to obtain a value of a generator type—that is, a type that conforms to the GeneratorType protocol. The program begins executing a loop by calling the next() method on the stream. If the value returned is not None, it is assigned to the item pattern, the program executes the statements, and then continues execution at the beginning of the loop. Otherwise, the program does not perform assignment or execute the statements, and it is finished executing the for-in statement.

The returned Generator is a struct and contains a collection value. You would not expect any changes to some other value to modify its behavior. Remember: [1,2,3] is no different than 4. They're both values. When you assign them, they make copies. So when you create a Generator over a collection value, you're going to snapshot that value, just like if I created a Generator over the number 4. (This raises an interesting problem, because Generators aren't really values, and so really shouldn't be structs. They should be classes. Swift stdlib has been fixing that. See the new AnyGenerator for instance. But they still contain an array value, and you would never expect changes to some other array value to impact them.)

See also "Structures and Enumerations Are Value Types" which goes into more detail on the importance of value types in Swift. Arrays are just structs.

Yes, that means there's logically copying. Swift has many optimizations to minimize actual copying when it's not needed. In your case, when you mutate the dictionary while it's being iterated, that will force a copy to happen. Mutation is cheap if you're the only consumer of a particular value's backing storage. But it's O(n) if you're not. (This is determined by the Swift builtin isUniquelyReferenced().) Long story short: Swift Collections are Copy-on-Write, and simply passing an array does not cause real memory to be allocated or copied.

You don't get COW for free. Your own structs are not COW. It's something that Swift does in stdlib. (See Mike Ash's great discussion of how you would recreate it.) Passing your own custom structs causes real copies to happen. That said, the majority of the memory in most structs is stored in collections, and those collections are COW, so the cost of copying structs is usually pretty small.

The book doesn't spend a lot of time drilling into value types in Swift (it explains it all; it just doesn't keep saying "hey, and this is what that implies"). On the other hand, it was the constant topic at WWDC. You may be interested particularly in Building Better Apps with Value Types in Swift which is all about this topic. I believe Swift in Practice also discussed it.

EDIT2:

@KarlP raises an interesting point in the comments below, and it's worth addressing. None of the value-safety promises we're discussing are related to for-in. They're based on Array. for-in makes no promises at all about what would happen if you mutated a collection while it is being iterated. That wouldn't even be meaningful. for-in doesn't "iterate over collections," it calls next() on Generators. So if your Generator becomes undefined if the collection is changed, then for-in will blow up because the Generator blew up.

That means that the following might be unsafe, depending on how strictly you read the spec:

func nukeFromOrbit<C: RangeReplaceableCollectionType>(var xs: C) {
    var hijack = true
    for x in xs {
        if hijack {
            xs.removeAll()
            hijack = false
        }
        print(x)
    }
}

And the compiler won't help you here. It'll work fine for all of the Swift collections. But if calling next() after mutation for your collection is undefined behavior, then this is undefined behavior.

My opinion is that it would be poor Swift to make a collection that allows its Generator to become undefined in this case. You could even argue that you've broken the Generator spec if you do (it offers no UB "out" unless the generator has been copied or has returned nil). So you could argue that the above code is totally within spec and your generator is broken. Those arguments tend to be a bit messy with a "spec" like Swift's which doesn't dive into all the corner cases.

Does this mean you can write unsafe code in Swift without getting a clear warning? Absolutely. But in the many cases that commonly cause real-world bugs, Swift's built-in behavior does the right thing. And in that, it is safer than some other options.

How does ConcurrentSkipListSet has Iterators that are weakly consistent? Understanding meaning of 'weakly consistent'

The "weakly consistent" term is defined in the java.util.concurrent package description:

Most concurrent Collection implementations (including most Queues)
also differ from the usual java.util conventions in that their
Iterators and Spliterators provide weakly consistent rather than
fast-fail traversal:
they may proceed concurrently with other operations
they will never throw ConcurrentModificationException
they are guaranteed to traverse elements as they existed upon construction exactly once, and may (but are not guaranteed to) reflect
any modifications subsequent to construction.

In this case with ConcurrentSkipListSet, the iterator does not have a "fast-fail" property, instead it reflects the modification of 4 having been removed from the set.

Internally, ConcurrentSkipListSet is implemented with ConcurrentSkipListMap, and its iterators are implemented by keeping track of which skip list node should be traversed next. This naturally gives you the "weakly consistent" property: If the next item is deleted, the iterator will still return it. If items beyond the next are deleted, the iterator will reflect those changes.

Is it safe to remove values from a dictionary while iterating in swift?

The iteration does not work like you're thinking it does. Your iterated etaDictionary is a copy of the original. Dictionaries are value types in Swift.

Clear example:

var dictionary = [1: 1, 2: 2, 3: 3, 4: 4, 5: 5]

for kvp in dictionary {
  dictionary = [:]
  print(kvp)
  print(dictionary.count)
}

outputs:

(key: 5, value: 5)
0
(key: 3, value: 3)
0
(key: 4, value: 4)
0
(key: 1, value: 1)
0
(key: 2, value: 2)
0

There's a name for your for loop, though; it is called filter. Use it instead.

func cleanupETADictionary(trips: [Int64]){
  etaDictionary = etaDictionary.filter { trips.contains($0.key) }
}

When to use enumerate for arrays in Swift?

To answer the question from the title, you use enumerate() when you need value's index in addition to the value itself:

If you need the integer index of each item as well as its value, use the enumerate() method to iterate over the array instead.
for (index, value) in shoppingList.enumerate() {
    print("Item \(index + 1): \(value)")
}

enumerate() provides a safe pattern for modifying array elements while iterating over them:

for (index, (var value)) in shoppingList.enumerate() {
    if value == "something" {
        shoppingList[index] = "something-else"
    }
}

Unexpected behavior when copying iterators using tee

tee creates two iterators from one iterator. Each of those two created iterators can be used independently, i.e. consuming one of them does not consume th other one.

However, the original iterator has to be consumed. For example:

a = iter(range(5))
b, c = tee(a)
for value in b:
    pass

At this point, b is consumed. By consuming b, a has been consumed as well, but c has not been consumed. See:

>>> list(a)
[]
>>> list(b)
[]
>>> list(c)
[0, 1, 2, 3, 4]

Now, in the original code, the same variable name is use for two things:

ita, itb = tee(ita)

There are 3 iterators here, but two of them are using variable name ita. That is the cause of the confusion. The new ita iterator does not get consumed together with itb, but the old one does.

Note that the original ita is used in for a in ita:, and not the new one. That is because it ita variable was read only the first time the for line was executed and then assigning something else to ita does not affect the loop.

You can see that the new ita is not consumed by adding this one line:

ita = iter(range(5))
for a in ita:
    print(a)
    if a == 2:
        ita, itb = tee(ita)
        for b in itb:
            pass
        print(list(ita))  # dded line

Prints:

0
1
2
[3, 4]

Java-like Iterator in Swift

A SequenceType (which CollectionType, and thus all Swift collections including array, conform to) is pretty simple. It requires you provide a generate() function that returns a type conforming to GeneratorType.

GeneratorType in turn only needs to provide one method: a next() that returns each element until the elements are exhausted. It returns an optional, returning nil after the last element is returned. This makes them pretty similar to Java iterators, only with next and hasNext combined into one via use of optionals.

Swift’s for…in is really syntactic sugar for a combination of getting a generator and then repeatedly calling next on it:

let a = [1, 2, 3]
for i in a { print(i) }
// is equivalent to:
var g = a.generate()
// the generator, being stateful, must be declared with var
while let i = g.next() { print(i) }

If using generators like this, take note of the comment above the definition of GeneratorType in the std lib doc:

Encapsulates iteration state and interface for iteration over a
sequence.

Note: While it is safe to copy a generator, advancing one
copy may invalidate the others.

Since writing a generator for a collection often involves a lot of boiler plate, there is a helper type, IndexingGenerator, that can be used. This implements a generator that starts at startIndex, and returns the value at that index and increments the index each time. A generate() that returns an IndexingGenerator is provided as the default implementation for CollectionType, which means if this is good enough for your purposes, you don’t need to implement generate when implementing a collection:

struct Bitfield: CollectionType {
    let data: UInt
    var startIndex: UInt { return 0 }
    var endIndex: UInt { return UInt(sizeofValue(data)*8) }
    subscript(idx: UInt) -> Bit {
        return (data >> idx) & 1 == 0
        ? Bit.Zero : Bit.One
    }
    // no need to implement generate()
}

This default was added in Swift 2.0. Prior to that, you had to provide a minimal generator that just returned an IndexingGenerator(self).

How Safe Are Swift Collections When Used with Invalidated Iterators/Indices