Using String.Characterview.Index.Successor() in for Statements

Using String.CharacterView.Index.successor() in for statements

you can use 'stepping'

let str = "abcdefgh"
for i in str.characters.indices where str.startIndex.distanceTo(i) % 2 == 0 {
print(i,str.characters[i])
}

prints

0 a
2 c
4 e
6 g

UPDATE, based on Sulthan's notes

for (i,v) in str.characters.enumerate() where i % 2 == 0 {
print(i, v)
}

Conforming String.CharacterView.Index to Strideable: fatal error when using stride(to:by:): cannot increment endIndex

Simply declaring the protocol conformance

extension String.CharacterView.Index : Strideable { }

compiles because String.CharacterView.Index conforms to
BidirectionalIndexType , and ForwardIndexType/BidirectionalIndexType have default method implementations for advancedBy() and distanceTo()
as required by Strideable.

Strideable has the default protocol method implementation
for stride():

extension Strideable {
// ...
public func stride(to end: Self, by stride: Self.Stride) -> StrideTo<Self>
}

So the only methods which are "directly" implemented for
String.CharacterView.Index are – as far as I can see - the successor() and predecessor() methods from BidirectionalIndexType.

As you already figured out, the default method implementation of
stride() does not work well with String.CharacterView.Index.

But is is always possible to define dedicated methods for a concrete type. For the problems of making String.CharacterView.Index conform to Strideable see
Vatsal Manot's answer below and the discussion in the comments – it took me a while to get what he meant :)

Here is a possible implementation of a stride(to:by:) method for String.CharacterView.Index:

extension String.CharacterView.Index {
typealias Index = String.CharacterView.Index

func stride(to end: Index, by stride: Int) -> AnySequence<Index> {

precondition(stride != 0, "stride size must not be zero")

return AnySequence { () -> AnyGenerator<Index> in
var current = self
return AnyGenerator {
if stride > 0 ? current >= end : current <= end {
return nil
}
defer {
current = current.advancedBy(stride, limit: end)
}
return current
}
}
}
}

This seems to work as expected:

let str = "01234"
str.startIndex.stride(to: str.endIndex, by: 2).forEach {
print($0,str.characters[$0])
}

Output

0 0
2 2
4 4

How to process every two-character substrings of a String in Swift 2.2+?

Use stride:

for index in 0.stride(through: trimmedString.characters.count, by: 2) {
// ...
}

To create the Range from the index, use the startIndex of the string and advance it. Example:

trimmedString.startIndex.advancedBy(index)
trimmedString.startIndex.advancedBy(index).successor().successor()

etc. Just check to make sure you aren't going out of bounds with successor().

Startindex swift 3 issue

In swift 3 you can no longer use advanced, successor or predecessor, instead you need to use

let indexAfter = someString.index(after: someIndex)
let indexBefore = someString.index(before: someIndex)
let anyOtherIndex = someString.index(someIndex, offsetBy: distance)

so your code should look like this

let index = (hasOverflow) ? 
text.index(text.startIndex, offsetBy: expectedInputLength) :
text.index(text.startIndex, offsetBy: text.characters.count)

As a side note,

text.index(text.startIndex, offsetBy: text.characters.count)

Is actually the same as

text.endIndex

So you can use this instead

let index = (hasOverflow) ? 
text.index(text.startIndex, offsetBy: expectedInputLength) :
text.endIndex

`endIndex` and `count` of a String are different

The raw value of String.CharacterView.Index is irrelevant and should not be used. Its raw value only has meaning from within String and CharacterView.

In your case, some Unicode characters are merely combining characters that modify adjacent characters to form a single grapheme. For example, U+0300, Combining Grave Accent:

    let str = "i\u{0300}o\u{0300}e\u{0300}"

print("String:",str)
print("count:",str.characters.count)
print("endIndex:",str.characters.endIndex)

var i = str.characters.startIndex
while i < str.characters.endIndex
{
print("\(i):\(str.characters[i])")
i = i.successor()
}

results in

String: ìòè
count: 3
endIndex: 6
0:ì
2:ò
4:è

A correct and idiomatic way to iterate through a collection with index in Swift?

For any collection the indices property returns a range of the valid
indices. To iterate over the indices and the corresponding elements
in parallel you can use zip():

for (idx, el) in zip(collection.indices, collection) {
print(idx, el)
}

Example for an array slice:

let a = ["a", "b", "c", "d", "e", "f"]
let slice = a[2 ..< 5]

for (idx, el) in zip(slice.indices, slice) {
print("element at \(idx) is \(el)")
}

Output:


element at 2 is c
element at 3 is d
element at 4 is e

You can define a custom extension method for that purpose
(taken from How to enumerate a slice using the original indices?):

// Swift 2:
extension CollectionType {
func indexEnumerate() -> AnySequence<(index: Index, element: Generator.Element)> {
return AnySequence(zip(indices, self))
}
}

// Swift 3:
extension Collection {
func indexEnumerate() -> AnySequence<(Indices.Iterator.Element, Iterator.Element)> {
return AnySequence(zip(indices, self))
}
}

Example for a character view:

let chars = "az".characters
for (idx, el) in chars.indexEnumerate() {
print("element at \(idx) is \(el)")
}

Output:


element at 0 is a
element at 1 is br>element at 3 is br>element at 7 is z

Good behavior for subscript

If you're implementing a subscript on String, you might want to first think about why the standard library chooses not to.

When you call self.startIndex.advancedBy(index), you're effectively writing something like this:

var i = self.startIndex
while i < index { i = i.successor() }

This occurs because String.CharacterView.Index is not a random-access index type. See docs on advancedBy. String indices aren't random-access because each Character in a string may be any number of bytes in the string's underlying storage — you can't just get character n by jumping n * characterSize into the storage like you can with a C string.

So, if one were to use your subscript operator to iterate through the characters in a string:

for i in 0..<string.characters.count {
doSomethingWith(string[i])
}

... you'd have a loop that looks like it runs in linear time, because it looks just like an array iteration — each pass through the loop should take the same amount of time, because each one just increments i and uses a constant-time access to get string[i], right? Nope. The advancedBy call in first pass through the loop calls successor once, the next calls it twice, and so on... if your string has n characters, the last pass through the loop calls successor n times (even though that generates a result that was used in the previous pass through the loop when it called successor n-1 times). In other words, you've just made an O(n2) operation that looks like an O(n) operation, leaving a performance-cost bomb for whoever else uses your code.

This is the price of a fully Unicode-aware string library.


Anyhow, to answer your actual question — there are two schools of thought for subscripts and domain checking:

  • Have an optional return type: func subscript(index: Index) -> Element?

    This makes sense when there's no sensible way for a client to check whether an index is valid without performing the same work as a lookup — e.g. for a dictionary, finding out if there's a value for a given key is the same as finding out what the value for a key is.

  • Require that the index be valid, and make a fatal error otherwise.

    The usual case for this is situations where a client of your API can and should check for validity before accessing the subscript. This is what Swift arrays do, because arrays know their count and you don't need to look into an array to see if an index is valid.

    The canonical test for this is precondition: e.g.

    func subscript(index: Index) -> Element {
    precondition(isValid(index), "index must be valid")
    // ... do lookup ...
    }

    (Here, isValid is some operation specific to your class for validating an index — e.g. making sure it's > 0 and < count.)

In just about any use case, it's not idiomatic Swift to return a "real" value in the case of a bad index, nor is it appropriate to return a sentinel value — separating in-band values from sentinels is the reason Swift has Optionals.

Which of these is more appropriate for your use case is... well, since your use case is problematic to being with, it's sort of a wash. If you precondition that index < count, you still incur an O(n) cost just to check that (because a String has to examine its contents to figure out which sequences of bytes constitute each character before it knows how many characters it has). If you make your return type optional, and return nil after calling advancedBy or count, you've still incurred that O(n) cost.

Finding index of character in Swift String

You are not the only one who couldn't find the solution.

String doesn't implement RandomAccessIndexType. Probably because they enable characters with different byte lengths. That's why we have to use string.characters.count (count or countElements in Swift 1.x) to get the number of characters. That also applies to positions. The _position is probably an index into the raw array of bytes and they don't want to expose that. The String.Index is meant to protect us from accessing bytes in the middle of characters.

That means that any index you get must be created from String.startIndex or String.endIndex (String.Index implements BidirectionalIndexType). Any other indices can be created using successor or predecessor methods.

Now to help us with indices, there is a set of methods (functions in Swift 1.x):

Swift 4.x

let text = "abc"
let index2 = text.index(text.startIndex, offsetBy: 2) //will call succ 2 times
let lastChar: Character = text[index2] //now we can index!

let characterIndex2 = text.index(text.startIndex, offsetBy: 2)
let lastChar2 = text[characterIndex2] //will do the same as above

let range: Range<String.Index> = text.range(of: "b")!
let index: Int = text.distance(from: text.startIndex, to: range.lowerBound)

Swift 3.0

let text = "abc"
let index2 = text.index(text.startIndex, offsetBy: 2) //will call succ 2 times
let lastChar: Character = text[index2] //now we can index!

let characterIndex2 = text.characters.index(text.characters.startIndex, offsetBy: 2)
let lastChar2 = text.characters[characterIndex2] //will do the same as above

let range: Range<String.Index> = text.range(of: "b")!
let index: Int = text.distance(from: text.startIndex, to: range.lowerBound)

Swift 2.x

let text = "abc"
let index2 = text.startIndex.advancedBy(2) //will call succ 2 times
let lastChar: Character = text[index2] //now we can index!
let lastChar2 = text.characters[index2] //will do the same as above

let range: Range<String.Index> = text.rangeOfString("b")!
let index: Int = text.startIndex.distanceTo(range.startIndex) //will call successor/predecessor several times until the indices match

Swift 1.x

let text = "abc"
let index2 = advance(text.startIndex, 2) //will call succ 2 times
let lastChar: Character = text[index2] //now we can index!

let range = text.rangeOfString("b")
let index: Int = distance(text.startIndex, range.startIndex) //will call succ/pred several times

Working with String.Index is cumbersome but using a wrapper to index by integers (see https://stackoverflow.com/a/25152652/669586) is dangerous because it hides the inefficiency of real indexing.

Note that Swift indexing implementation has the problem that indices/ranges created for one string cannot be reliably used for a different string, for example:

Swift 2.x

let text: String = "abc"
let text2: String = "br>
let range = text.rangeOfString("b")!

//can randomly return a bad substring or throw an exception
let substring: String = text2[range]

//the correct solution
let intIndex: Int = text.startIndex.distanceTo(range.startIndex)
let startIndex2 = text2.startIndex.advancedBy(intIndex)
let range2 = startIndex2...startIndex2

let substring: String = text2[range2]

Swift 1.x

let text: String = "abc"
let text2: String = "br>
let range = text.rangeOfString("b")

//can randomly return nil or a bad substring
let substring: String = text2[range]

//the correct solution
let intIndex: Int = distance(text.startIndex, range.startIndex)
let startIndex2 = advance(text2.startIndex, intIndex)
let range2 = startIndex2...startIndex2

let substring: String = text2[range2]

Swift for-in loop with enumerate on custom Array2D class?

It might suffice defining your own enumerate taking advantage of the one you already have:

func enumerate() -> AnyGenerator<((Int, Int), T?)> {
var index = 0
var g = array.generate()
return anyGenerator() {
if let item = g.next() {
let column = index % self.columns
let row = index / self.columns
++index
return ((column, row) , item)
}
return nil
}
}

Notice in this case you could avoid conforming to SequenceType since I use generate from the private array. Anyway it could be consistent to do so.

Here is how then you could use it:

var a2d = Array2D<Int>(columns: 2, rows: 4)
a2d[0,1] = 4

for ((column, row), item) in a2d.enumerate() {
print ("[\(column) : \(row)] = \(item)")
}

Hope this helps

Swift 2.0 String behavior

The index of a String is no more related to the number of characters (count) in Swift 2.0. It is an “opaque” struct (defined as CharacterView.Index) used only to iterate through the characters of a string. So even if it is printed as an integer, it should not be considered or used as an integer, to which, for instance, you can sum 2 to get the second character from the current one. What you can do is only to apply the two methods predecessor and successor to get the previous or successive index in the String. So, for instance, to get the second character from that with index idx in mixedString you can do:

mixedString[idx.successor().successor()]

Of course you can use more confortable ways of reading the characters of string, like for instance, the for statement or the global function indices(_:).

Consider that the main benefit of this approach is not to the threat multi-bytes characters in Unicode strings, as emoticons, but rather to treat in a uniform way identical (for us humans!) strings that can have multiple representations in Unicode, as different set of “scalars”, or characters. An example is café, that can be represented either with four Unicode “scalars” (unicode characters), or with five Unicode scalars. And note that this is a completely different thing from Unicode representations like UTF-8, UTF-16, etc., that are ways of mapping Unicode scalars into memory bytes.



Related Topics



Leave a reply



Submit