String Comparison in Swift Is Not Transitive

String comparison in Swift is not transitive

It looks like this is not supposed to happen:

Q: Is transitive consistency maintained by the [Unicode Collation Algorithm]?

A: Yes, for any strings A, B, and C, if A < B and B < C, then A < C. However, implementers must be careful to produce implementations that accurately reproduce the results of the Unicode Collation Algorithm as they optimize their own algorithms. It is easy to perform careless optimizations — especially with Incremental Comparison algorithms — that fail this test. Other items to check are the proper distinction between the bases of accents. For example, the sequence <u-macron, u-diaeresis-macron> should compare as less than <u-macron-diaeresis, u-macron>; this is a secondary distinction, based on the weighting of the accents, which must be correctly associated with the primary weights of their respective base letters.

(Source: Unicode Collation FAQ)

In the UnicodeNormalization.cpp file, ucol_strcoll and ucol_strcollIter are called, which are part of the ICU project. This may be a bug in the Swift standard library or the ICU project.
I reported this issue to the Swift Bug Tracker.

What does it mean that string and character comparisons in Swift are not locale-sensitive?

(All code examples updated for Swift 3 now.)

Comparing Swift strings with < does a lexicographical comparison
based on the so-called "Unicode Normalization Form D" (which can be computed with
decomposedStringWithCanonicalMapping)

For example, the decomposition of

"ä" = U+00E4 = LATIN SMALL LETTER A WITH DIAERESIS

is the sequence of two Unicode code points

U+0061,U+0308 = LATIN SMALL LETTER A + COMBINING DIAERESIS

For demonstration purposes, I have written a small String extension which dumps the
contents of the String as an array of Unicode code points:

extension String {
var unicodeData : String {
return self.unicodeScalars.map {
String(format: "%04X", $0.value)
}.joined(separator: ",")
}
}

Now lets take some strings, sort them with <:

let someStrings = ["ǟψ", "äψ", "ǟx", "äx"].sorted()
print(someStrings)
// ["a", "ã", "ă", "ä", "ǟ", "b"]

and dump the Unicode code points of each string (in original and decomposed
form) in the sorted array:

for str in someStrings {
print("\(str) \(str.unicodeData) \(str.decomposedStringWithCanonicalMapping.unicodeData)")
}

The output

äx  00E4,0078  0061,0308,0078
ǟx 01DF,0078 0061,0308,0304,0078
ǟψ 01DF,03C8 0061,0308,0304,03C8
äψ 00E4,03C8 0061,0308,03C8

nicely shows that the comparison is done by a lexicographic ordering of the Unicode
code points in the decomposed form.

This is also true for strings of more than one character, as the following example
shows. With

let someStrings = ["ǟψ", "äψ", "ǟx", "äx"].sorted()

the output of above loop is

äx  00E4,0078  0061,0308,0078
ǟx 01DF,0078 0061,0308,0304,0078
ǟψ 01DF,03C8 0061,0308,0304,03C8
äψ 00E4,03C8 0061,0308,03C8

which means that

"äx" < "ǟx", but "äψ" > "ǟψ"

(which was at least unexpected for me).

Finally let's compare this with a locale-sensitive ordering, for example swedish:

let locale = Locale(identifier: "sv") // svenska
var someStrings = ["ǟ", "ä", "ã", "a", "ă", "b"]
someStrings.sort {
$0.compare($1, locale: locale) == .orderedAscending
}

print(someStrings)
// ["a", "ă", "ã", "b", "ä", "ǟ"]

As you see, the result is different from the Swift < sorting.

How does the Swift string more than operator work

I believe javascript uses exactly the same string comparison approach, and the same syntax. In javascript you could also use localeCompare(). And in swift you could alternatively use localizedCompare(_:) (or one of the other string comparison functions). They're all different ways, and with different options, to alphabetically compare strings.

Sorting Struct - Binary operator '' cannot be applied to two 'String?' operands

You cannot compare optional strings, as written in the error message. Unwrap the strings and then try to compare.

self.hosts.sorted { lhs, rhs in
guard let lhsName = lhs.hostName, let rhsName = rhs.hostName else { return false }
return lhsName < rhsName
}

EDIT - The above solution is incorrect as it breaks the transitivity of the sort() function and will not work with large sets of data; Thanks to @Alexander - Reinstate Monica for pointing out.

A proper solution would be to either force-unwrap the values which is not recommended or provide a nil-coalescing value like so:

let sortedArray = self.hosts.sorted { lhs, rhs in
lhs.hostName ?? "" < rhs.hostName ?? ""
}

Compare three values for equality

You can use the power of tuples and the Transitive Property of Equality.

if (number1, number2) == (number2, number3) {

}

The clause of this IF is true only when number1 is equals to number2 AND number2 is equals to number3. It means that the 3 values must be equals.

Objects with 4 optional String? properties with weighted value, how to return an object with the best match to a set of properties

Here's one reasonable solution. See code comments for details.

struct OptionalObject {
let prop1: String?
let prop2: String?
let prop3: String?
let prop4: String?
}

struct ConcreteObject {
let prop1: String
let prop2: String
let prop3: String
let prop4: String

// Determine the score.
// "matches" counts the number of matching properties.
// "weight" gives 8 for the 1st property, 4 for the 2nd, 2 for the 3rd, 1 for the 4th. Adjust to suit your needs
func score(for opt: OptionalObject) -> (matches: Int, weight: Int) {
var matches = 0
var weight = 0
if opt.prop1 == self.prop1 { matches += 1; weight += 8 }
if opt.prop2 == self.prop2 { matches += 1; weight += 4 }
if opt.prop3 == self.prop3 { matches += 1; weight += 2 }
if opt.prop4 == self.prop4 { matches += 1; weight += 1 }

return (matches, weight)
}

// Compares two OptionalObject by getting the score of each
// against "self".
func compare(lhs: OptionalObject, rhs: OptionalObject) -> Bool {
let scoreL = score(for: lhs)
let scoreR = score(for: rhs)

// If the number of matches are the same, compare the weight
return scoreL > scoreR
}
}

// Test ConcreteObject
let concrete = ConcreteObject(prop1: "DEF", prop2: "123", prop3: "Hello", prop4: "Goodbye")

// List of OptionalObject
var optionals: [OptionalObject] = [
OptionalObject(prop1: nil, prop2: nil, prop3: "Hello", prop4: nil),
OptionalObject(prop1: "DEF", prop2: "456", prop3: nil, prop4: nil),
OptionalObject(prop1: "ABC", prop2: "123", prop3: "Hello", prop4: "Goodbye"),
OptionalObject(prop1: nil, prop2: nil, prop3: "Hello", prop4: "Goodbye"),
OptionalObject(prop1: "DEF", prop2: "456", prop3: "Hello", prop4: "Goodbye"),
//OptionalObject(prop1: nil, prop2: nil, prop3: nil, prop4: nil),
]

// Sort the list based on the ConcreteObject
let sorted = optionals.sorted { concrete.compare(lhs: $0, rhs: $1) }
print(sorted)

The results are sorted in the desired order. The first object in sorted has the highest score.

swift_class_getInstanceExtents doesn't appear in XCode - how do you call it?

Joe Groff on swift-users answered this question for me. You add:

@_silgen_name("swift_class_getInstanceExtents") func swift_class_getInstanceExtents(theClass: AnyClass) -> (negative: UInt, positive: UInt)

To your file and then you can call swift_class_getInstanceExtents.

Compare arrays in swift

You’re right to be slightly nervous about ==:

struct NeverEqual: Equatable { }
func ==(lhs: NeverEqual, rhs: NeverEqual)->Bool { return false }
let x = [NeverEqual()]
var y = x
x == y // this returns true

[NeverEqual()] == [NeverEqual()] // false
x == [NeverEqual()] // false

let z = [NeverEqual()]
x == z // false

x == y // true

y[0] = NeverEqual()
x == y // now false

Why? Swift arrays do not conform to Equatable, but they do have an == operator, defined in the standard library as:

func ==<T : Equatable>(lhs: [T], rhs: [T]) -> Bool

This operator loops over the elements in lhs and rhs, comparing the values at each position. It does not do a bitwise compare – it calls the == operator on each pair of elements. That means if you write a custom == for your element, it’ll get called.

But it contains an optimization – if the underlying buffers for the two arrays are the same, it doesn’t bother, it just returns true (they contain identical elements, of course they’re equal!).

This issue is entirely the fault of the NeverEqual equality operator. Equality should be transitive, symmetric and reflexive, and this one isn't reflexive (x == x is false). But it could still catch you unawares.

Swift arrays are copy-on-write – so when you write var x = y it doesn’t actually make a copy of the array, it just points x’s storage buffer pointer at y’s. Only if x or y are mutated later does it then make a copy of the buffer, so that the unchanged variable is unaffected. This is critical for arrays to behave like value types but still be performant.

In early versions of Swift, you actually could call === on arrays (also in early versions, the mutating behaviour was a bit different, if you mutated x, y would also change even though it had been declared with let – which freaked people out so they changed it).

You can kinda reproduce the old behaviour of === on arrays with this (very implementation-dependent not to be relied-on except for poking and prodding investigations) trick:

let a = [1,2,3]
var b = a

a.withUnsafeBufferPointer { outer in
b.withUnsafeBufferPointer { inner in
println(inner.baseAddress == outer.baseAddress)
}
}


Related Topics



Leave a reply



Submit