Explain Swift Iterators

Explain Swift Iterators

Yes, the next() method of AnyIterator calls the given closure.
And in your code, that closure returns the same first element on each call, because it does not remember what elements have been returned already.

If Swift had a yield statement like Python or C# then things would be easier: you could yield segment or yield i and are done.

But – unfortunately? – Swift has no yield statement, which means
that the closure must explicitly manage some state in order to resume the iteration
with the next element on each call.

One possibility would be to maintain two indices, one for
the current segment and one for the current element within
a segment if that is an array:

func chain(_ segments: Any...) -> AnyIterator<Int> {
var currentSegment = 0 // index of current segment
var currentElement = 0 // index of current element within current segment
return AnyIterator<Int> {
while currentSegment < segments.count {
let next = segments[currentSegment]
switch next {
case let value as Int:
currentSegment += 1
return value
case let segment as [Int]:
if currentElement < segment.count {
let val = segment[currentElement]
currentElement += 1
return val
}
currentSegment += 1
currentElement = 0
default:
return nil
}
}
return nil
}
}

This can be generalized to arbitrarily nested arrays:

func chain(_ segments: Any...) -> AnyIterator<Int> {
var stack: [(Any, Int)] = [(segments, 0)]
return AnyIterator<Int> {
while let (next, idx) = stack.popLast() {
switch next {
case let value as Int:
return value
case let segments as [Any]:
if idx < segments.count {
stack.append((segments, idx + 1))
stack.append((segments[idx], 0))
}
default:
return nil
}
}
return nil
}
}

The still-to-be-processed arrays are on the stack together with
their current index. The arrays themselves are not modified,
so that the copy is cheap.

Example:

let G = chain([1, 2, [3]], 4, [5, 6, [], 7])
while let g = G.next() {
print(g)
}
// 1 2 3 4 5 6 7

See also Implementing recursive generator for simple tree structure in Swift for more
approaches to recursively enumerate a tree-like structure.

What is iterator.element in swift?

Iterator.Element is the easiest to understand here. The generic parameter S must be a type that conforms to Sequence, as you've specified here:

func computeScoreIncrement<S : Sequence>(_ pastTurnsReversed: S) -> Int
// ^^^^^^^^^^^^^^

Therefore, S.Iterator.Element refers to the type of sequence that S is. If, say, S is inferred to be [Int], then S.Iterator.Element is Int - [Int] is a sequence of Int.

Now onto the where Turn == S.Iterator.Element part.

As mentioned above, S must be a type that conforms to Sequence, but that's not all of the constraints! S.Iterator.Element must also be the same type as Turn. You didn't show how Turn is defined. It may be a generic parameter of the enclosing class, a class, struct or enum.

Thus, I can pass a [Turn] to this method, to instances of some other type that is a sequence of Turns.

Iterator Int in Swift

The equivalent of Iterator is Sequence.

func put<S: Sequence>(path: S) where S.Element == Int

Of course you don't need the where clause if you don't care about the type of the elements.

Typically a trie would itself be generic over its own Element, so it'd be something like

struct Trie<Element: Comparable> {
func put<S: Sequence>(path: S) where S.Element == Element { ... }
}

If you really want to accept iterators directly, you can, but this is pretty rare. It would be along these lines:

func put<I: IteratorProtocol>(path: inout I) where I.Element == Int

To directly make use of an iterator, you need to modify it, and that means you need it to be inout (or you could return a copy of it, but can be dangerous since advancing multiple copies of an iterator is undefined). This is really unusual, though. You likely mean Sequence.

Why are Swift iterators slower than array building?

You asked "why are its [Swift] generators so slow, even slower than Python’s in some cases?"

My answer is that I do not think that they are nearly as slow as your results may indicate. In particular, I will try to demonstrate that looping through an iterator should be faster than constructing an array for all your test cases.

In earlier work (see a related blog post at http://lemire.me/blog/2016/09/22/swift-versus-java-the-bitset-performance-test/), I found that Swift iterators were about half as fast as the equivalent in Java when working over a bitset class. That's not great but Java is very efficient in this respect. Meanwhile, Go does worse. I submit to you that Swift iterators are probably not ideally efficient, but they are probably within a factor of two of what is possible with raw C code. And the performance gap probably has to do with insufficient function inlining in Swift.

I see that you are using an AnyIterator. I suggest deriving a struct from IteratorProtocol instead which has the benefit of insuring that there does not have to be any dynamic dispatch. Here is a relatively efficient possibility:

public struct FastFlattenIterator: IteratorProtocol {
let segments: [Any]
var i = 0 // top-level index
var j = 0 // second-level index
var jmax = 0 // essentially, this is currentarray.count, but we buffer it
var currentarray : [Int]! // quick reference to an int array to be flatten

init(_ segments: [Any]) {
self.segments = segments
}

public mutating func next() -> Int? {
if j > 0 { // we handle the case where we iterate within an array separately
let val = currentarray[j]
j += 1
if j == jmax {
j = 0
i += 1
}
return val
}
while i < segments.count {
switch segments[i] {
case let e as Int: // found an integer value
i += 1
return e
case let E as [Int]: // first encounter with an array
jmax = E.count
currentarray = E
if jmax > 0 {
j = 1
return E[0]
}
i += 1
default:
return nil
}
}
return nil
}
}

With this class, I am getting the following numbers. For each test case, the first four approaches are taken from your code sample whereas the last two (fast iterator) are build using the new struct. Notice that "Looping through a fast iterator" is always fastest.

test case 1 (small mixed input)
Filling an array : 0.0073099999999999997 ms
Filling an array, and looping through it : 0.0069870000000000002 ms
Looping through a generator : 0.18385799999999999 ms
Materializing the generator to an array : 0.18745700000000001 ms
Looping through a fast iterator : 0.005372 ms
Materializing the fast iterator : 0.015883999999999999 ms

test case 2 (large input, [Int] s)
Filling an array : 2.125931 ms
Filling an array, and looping through it : 2.1169820000000001 ms
Looping through a generator : 15.064767 ms
Materializing the generator to an array : 15.45152 ms
Looping through a fast iterator : 1.572919 ms
Materializing the fast iterator : 1.964912 ms

test case 3 (large input, Int s)
Filling an array : 2.9140269999999999 ms
Filling an array, and looping through it : 2.9064290000000002 ms
Looping through a generator : 9.8297640000000008 ms
Materializing the generator to an array : 9.8297640000000008 ms
Looping through a fast iterator : 1.978038 ms
Materializing the fast iterator : 2.2565339999999998 ms

You will find my complete code sample on GitHub: https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/tree/master/extra/swift/iterators

Difference between Enumerator and Iterator in Swift or Objective C

The distinction between the two in programming is not always large and you'll find cases where the words are used interchangeably, however they do describe something different. You wrote:

As per my understating iterator performs iteration/looping over collection. e.g forEach, for loop, while, do while etc. But I do not what is enumerator.

The key point of iteration is just "iteration/looping" – simply repeating something, e.g. the Newton-Raphson method is an iterative algorithm for approximating a square-root and is typically implemented in a programming language using a loop.

Enumeration is listing things out, that's where your "over collection" comes from. So the for/in loop (an iterative construct) enumerates a collection. The standard collections all provide enumerators which return NSEnumerator objects, and you typically, but not always, would use these in an iterative construct to process the collection.

Enumerated types, enum's, are so called because they list out all the values, i.e. enumerate them.

So your interviewer might have been expecting you to state that iteration is supported in Swift/Objective-C by the iterative statements of the languages (for, while etc.), while enumeration is provided by objects/methods which either return each item in turn (e.g. NSArray's objectEnumerator) or which process each element in turn (e.g. NSArray's enumerateObjectsUsingBlock:). The for/in combines these two, it is an iterative construct which directly uses an object enumerator rather than requiring the programmer to use the enumerator directly.

HTH

Java-like Iterator in Swift

A SequenceType (which CollectionType, and thus all Swift collections including array, conform to) is pretty simple. It requires you provide a generate() function that returns a type conforming to GeneratorType.

GeneratorType in turn only needs to provide one method: a next() that returns each element until the elements are exhausted. It returns an optional, returning nil after the last element is returned. This makes them pretty similar to Java iterators, only with next and hasNext combined into one via use of optionals.

Swift’s for…in is really syntactic sugar for a combination of getting a generator and then repeatedly calling next on it:

let a = [1, 2, 3]
for i in a { print(i) }
// is equivalent to:
var g = a.generate()
// the generator, being stateful, must be declared with var
while let i = g.next() { print(i) }

If using generators like this, take note of the comment above the definition of GeneratorType in the std lib doc:

Encapsulates iteration state and interface for iteration over a
sequence.

  • Note: While it is safe to copy a generator, advancing one
    copy may invalidate the others.

Since writing a generator for a collection often involves a lot of boiler plate, there is a helper type, IndexingGenerator, that can be used. This implements a generator that starts at startIndex, and returns the value at that index and increments the index each time. A generate() that returns an IndexingGenerator is provided as the default implementation for CollectionType, which means if this is good enough for your purposes, you don’t need to implement generate when implementing a collection:

struct Bitfield: CollectionType {
let data: UInt
var startIndex: UInt { return 0 }
var endIndex: UInt { return UInt(sizeofValue(data)*8) }
subscript(idx: UInt) -> Bit {
return (data >> idx) & 1 == 0
? Bit.Zero : Bit.One
}
// no need to implement generate()
}

This default was added in Swift 2.0. Prior to that, you had to provide a minimal generator that just returned an IndexingGenerator(self).

Iterating with for .. in on a changing collection

The documentation for IteratorProtocol says "whenever you use a for-in loop with an array, set, or any other collection or sequence, you’re using that type’s iterator." So, we are guaranteed that a for in loop is going to be using .makeIterator() and .next() which is defined most generally on Sequence and IteratorProtocol respectively.

The documentation for Sequence says that "the Sequence protocol makes no requirement on conforming types regarding whether they will be destructively consumed by iteration." As a consequence, this means that an iterator for a Sequence is not required to make a copy, and so I do not think that modifying a sequence while iterating over it is, in general, safe.

This same caveat does not occur in the documentation for Collection, but I also don't think there is any guarantee that the iterator makes a copy, and so I do not think that modifying a collection while iterating over it is, in general, safe.

But, most collection types in Swift are structs with value semantics or copy-on-write semantics. I'm not really sure where the documentation for this is, but this link does say that "in Swift, Array, String, and Dictionary are all value types... You don’t need to do anything special — such as making an explicit copy — to prevent other code from modifying that data behind your back." In particular, this means that for Array, .makeIterator() cannot hold a reference to your array because the iterator for Array does not have to "do anything special" to prevent other code (i.e. your code) from modifying the data it holds.

We can explore this in more detail. The Iterator type of Array is defined as type IndexingIterator<Array<Element>>. The documentation IndexingIterator says that it is the default implementation of the iterator for collections, so we can assume that most collections will use this. We can see in the source code for IndexingIterator that it holds a copy of its collection

@frozen
public struct IndexingIterator<Elements: Collection> {
@usableFromInline
internal let _elements: Elements
@usableFromInline
internal var _position: Elements.Index

@inlinable
@inline(__always)
/// Creates an iterator over the given collection.
public /// @testable
init(_elements: Elements) {
self._elements = _elements
self._position = _elements.startIndex
}
...
}

and that the default .makeIterator() simply creates this copy.

extension Collection where Iterator == IndexingIterator<Self> {
/// Returns an iterator over the elements of the collection.
@inlinable // trivial-implementation
@inline(__always)
public __consuming func makeIterator() -> IndexingIterator<Self> {
return IndexingIterator(_elements: self)
}
}

Although you might not want to trust this source code, the documentation for library evolution claims that "the @inlinable attribute is a promise from the library developer that the current definition of a function will remain correct when used with future versions of the library" and the @frozen also means that the members of IndexingIterator cannot change.

Altogether, this means that any collection type with value semantics and an IndexingIterator as its Iterator must make a copy when using using for in loops (at least until the next ABI break, which should be a long-way off). Even then, I don't think Apple is likely to change this behavior.

In Conclusion

I don't know of any place that it is explicitly spelled out in the docs "you can modify an array while you iterate over it, and the iteration will proceed as if you made a copy" but that's also the kind of language that probably shouldn't be written down as writing such code could definitely confuse a beginner.

However, there is enough documentation lying around which says that a for in loop just calls .makeIterator() and that for any collection with value semantics and the default iterator type (for example, Array), .makeIterator() makes a copy and so cannot be influenced by code inside the loop. Further, because Array and some other types like Set and Dictionary are copy-on-write, modifying these collections inside a loop will have a one-time copy penalty as the body of the loop will not have a unique reference to its storage (because the iterator will). This is the exact same penalty that modifying the collection outside the loop with have if you don’t have a unique reference to the storage.

Without these assumptions, you aren't guaranteed safety, but you might have it anyway in some circumstances.

Edit:

I just realized we can create some cases where this is unsafe for sequences.

import Foundation

/// This is clearly fine and works as expected.
print("Test normal")
for _ in 0...10 {
let x: NSMutableArray = [0,1,2,3]
for i in x {
print(i)
}
}

/// This is also okay. Reassigning `x` does not mutate the reference that the iterator holds.
print("Test reassignment")
for _ in 0...10 {
var x: NSMutableArray = [0,1,2,3]
for i in x {
x = []
print(i)
}
}

/// This crashes. The iterator assumes that the last index it used is still valid, but after removing the objects, there are no valid indices.
print("Test removal")
for _ in 0...10 {
let x: NSMutableArray = [0,1,2,3]
for i in x {
x.removeAllObjects()
print(i)
}
}

/// This also crashes. `.enumerated()` gets a reference to `x` which it expects will not be modified behind its back.
print("Test removal enumerated")
for _ in 0...10 {
let x: NSMutableArray = [0,1,2,3]
for i in x.enumerated() {
x.removeAllObjects()
print(i)
}
}

The fact that this is an NSMutableArray is important because this type has reference semantics. Since NSMutableArray conforms to Sequence, we know that mutating a sequence while iterating over it is not safe, even when using .enumerated().

How safe are swift collections when used with invalidated iterators / indices?

As you say, #1 is not an issue. You do not have a pointer to the object in Swift. You either have its value or a reference to it. If you have its value, then it's a copy. If you have a reference, then it's protected. So there's no issue here.

But let's consider the second and experiment, be surprised, and then stop being surprised.

var xs = [1,2,3,4]

for x in xs { // (1)
if x == 2 {
xs.removeAll() // (2)
}
print(x) // Prints "1\n2\n3\n\4\n"
}

xs // [] (3)

Wait, how does it print all the values when we blow away the values at (2). We are very surprised now.

But we shouldn't be. Swift arrays are values. The xs at (1) is a value. Nothing can ever change it. It's not "a pointer to memory that includes an array structure that contains 4 elements." It's the value [1,2,3,4]. At (2), we don't "remove all elements from the thing xs pointed to." We take the thing xs is, create an array that results if you remove all the elements (that would be [] in all cases), and then assign that new array to xs. Nothing bad happens.

So what does the documentation mean by "invalidates all indices?" It means exactly that. If we generated indices, they're no good anymore. Let's see:

var xs = [1,2,3,4]

for i in xs.indices {
if i == 2 {
xs.removeAll()
}
print(xs[i]) // Prints "1\n2\n" and then CRASH!!!
}

Once xs.removeAll() is called, there's no promise that the old result of xs.indices means anything anymore. You are not permitted to use those indices safely against the collection they came from.

"Invalidates indices" in Swift is not the same as C++'s "invalidates iterators." I'd call that pretty safe, except the fact that using collection indices is always a bit dangerous and so you should avoid indexing collections when you can help it; iterate them instead. Even if you need the indexes for some reason, use enumerate to get them without creating any of the danger of indexing.

(Side note, dict["key"] is not indexing into dict. Dictionaries are a little confusing because their key is not their index. Accessing dictionaries by their DictionaryIndex index is just as dangerous as accessing arrays by their Int index.)

Note also that the above doesn't apply to NSArray. If you modify NSArray while iterating it, you'll get a "mutated collection while iterating" error. I'm only discussing Swift data types.


EDIT: for-in is very explicit in how it works:

The generate() method is called on the collection expression to obtain a value of a generator type—that is, a type that conforms to the GeneratorType protocol. The program begins executing a loop by calling the next() method on the stream. If the value returned is not None, it is assigned to the item pattern, the program executes the statements, and then continues execution at the beginning of the loop. Otherwise, the program does not perform assignment or execute the statements, and it is finished executing the for-in statement.

The returned Generator is a struct and contains a collection value. You would not expect any changes to some other value to modify its behavior. Remember: [1,2,3] is no different than 4. They're both values. When you assign them, they make copies. So when you create a Generator over a collection value, you're going to snapshot that value, just like if I created a Generator over the number 4. (This raises an interesting problem, because Generators aren't really values, and so really shouldn't be structs. They should be classes. Swift stdlib has been fixing that. See the new AnyGenerator for instance. But they still contain an array value, and you would never expect changes to some other array value to impact them.)

See also "Structures and Enumerations Are Value Types" which goes into more detail on the importance of value types in Swift. Arrays are just structs.

Yes, that means there's logically copying. Swift has many optimizations to minimize actual copying when it's not needed. In your case, when you mutate the dictionary while it's being iterated, that will force a copy to happen. Mutation is cheap if you're the only consumer of a particular value's backing storage. But it's O(n) if you're not. (This is determined by the Swift builtin isUniquelyReferenced().) Long story short: Swift Collections are Copy-on-Write, and simply passing an array does not cause real memory to be allocated or copied.

You don't get COW for free. Your own structs are not COW. It's something that Swift does in stdlib. (See Mike Ash's great discussion of how you would recreate it.) Passing your own custom structs causes real copies to happen. That said, the majority of the memory in most structs is stored in collections, and those collections are COW, so the cost of copying structs is usually pretty small.

The book doesn't spend a lot of time drilling into value types in Swift (it explains it all; it just doesn't keep saying "hey, and this is what that implies"). On the other hand, it was the constant topic at WWDC. You may be interested particularly in Building Better Apps with Value Types in Swift which is all about this topic. I believe Swift in Practice also discussed it.


EDIT2:

@KarlP raises an interesting point in the comments below, and it's worth addressing. None of the value-safety promises we're discussing are related to for-in. They're based on Array. for-in makes no promises at all about what would happen if you mutated a collection while it is being iterated. That wouldn't even be meaningful. for-in doesn't "iterate over collections," it calls next() on Generators. So if your Generator becomes undefined if the collection is changed, then for-in will blow up because the Generator blew up.

That means that the following might be unsafe, depending on how strictly you read the spec:

func nukeFromOrbit<C: RangeReplaceableCollectionType>(var xs: C) {
var hijack = true
for x in xs {
if hijack {
xs.removeAll()
hijack = false
}
print(x)
}
}

And the compiler won't help you here. It'll work fine for all of the Swift collections. But if calling next() after mutation for your collection is undefined behavior, then this is undefined behavior.

My opinion is that it would be poor Swift to make a collection that allows its Generator to become undefined in this case. You could even argue that you've broken the Generator spec if you do (it offers no UB "out" unless the generator has been copied or has returned nil). So you could argue that the above code is totally within spec and your generator is broken. Those arguments tend to be a bit messy with a "spec" like Swift's which doesn't dive into all the corner cases.

Does this mean you can write unsafe code in Swift without getting a clear warning? Absolutely. But in the many cases that commonly cause real-world bugs, Swift's built-in behavior does the right thing. And in that, it is safer than some other options.

Trying to get generic code from working Iterator pattern

Solution found!

struct Whatevers<T> {
let whatevers: [T]
}

extension Whatevers: Sequence {

func makeIterator() -> WhateversIterator<T> {
return WhateversIterator(sequence: whatevers, current: 0)
}
}

struct WhateversIterator<T>: IteratorProtocol {
let sequence: [T]
var current = 0

mutating func next() -> T? {
defer { current += 1 }
return sequence.count > current ? sequence[current] : nil
}
}

All the mistakes were about returning types from functions makeIterator and next.

Hope somebody will find it helpful!



Related Topics



Leave a reply



Submit