Swift Mutable Set: Duplicate Element Found

Swift mutable set: Duplicate element found

The implementation of Swift sets is similar to that of dictionaries, which is described nicely in Exploring Swift Dictionary's Implementation. In particular, the element storage is a list of “buckets“ each of which can be occupied or not. When a new element is inserted into the set, its hash value is used to determine the initial bucket. If that bucket is occupied, a linear search for the next free bucket is done. Similarly, when searching an element in the set, the hash value is used for determining the initial bucket, and then a linear search is done until the element (or an unoccupied bucket) is found.

(The details can be found in the open source implementation, the most relevant source files are
Set.swift, NativeSet.swift, SetStorage.swift and HashTable.swift.)

Mutating the hash value of an inserted element breaks the invariants of the set storage implementation: Locating an element via its initial bucket no longer works. And mutating other properties which affect equality can lead to multiple “equal” elements in the same bucket list.

Therefore I think it is safe to say that

After inserting an instance of a reference type into a set, the properties of that instance must not be modified in a way that affects its hash value or testing for equality.

Examples

First, this is only a problem for sets of a reference type. A set of a value type contains independent copies of the value, and modifying a property of that value after the insertion does not affect the set:

struct Foo: Hashable {
var x: Int
}

var set = Set<Foo>()
var foo = Foo(x: 1)
set.insert(foo)
print(set.map { $0.x }) // [1]
foo.x = 2
print(set.map { $0.x }) // [1]
set.insert(foo)
print(set.map { $0.x }) // [1, 2]

Instances of a reference type are “pointers” to the actual object storage, and modifying a property of that instance does not mutate the reference. It is therefore possible to modify a property of an instance after it has been inserted into a set:

class Bar: Hashable {
var x : Int

init(x: Int) { self.x = x }

static func == (lhs: Bar, rhs: Bar) -> Bool { return lhs.x == rhs.x }

func hash(into hasher: inout Hasher) { hasher.combine(x) }
}

var set = Set<Bar>()
let bar = Bar(x: 1)
set.insert(bar)
print(set.map { $0.x }) // [1]
bar.x = 2
print(set.map { $0.x }) // [2]

However, this leads to crashes easily, e.g. if we insert the same reference again:

set.insert(bar)

Fatal error: Duplicate elements of type 'Bar' were found in a Set.
This usually means either that the type violates Hashable's requirements, or
that members of such a set were mutated after insertion.

Here is another example, where the hash value is identical for all instances, but modifying a property which is used for equality testing leads to a set of two “equal” instances:

class Baz: Hashable {
var x : Int

init(x: Int) { self.x = x }

static func == (lhs: Baz, rhs: Baz) -> Bool { return lhs.x == rhs.x }

func hash(into hasher: inout Hasher) { }
}

var set = Set<Baz>()
let baz1 = Baz(x: 1)
set.insert(baz1)
let baz2 = Baz(x: 2)
set.insert(baz2)
baz1.x = 2

print(set.map { $0.x }) // [2, 2]
print(set.count) // 2
print(Set(Array(set)).count) // 1 br>

Set's symmetric difference in mutable value creates duplicate

The issue is that you are expecting two Options to be equal if they have the same title and key, but by default Swift is also checking the value. Therefore, two Options with different values are considered different.

Symmetric difference returns a new set with the elements that are either in this set or in the given sequence, but not in both. Because you have changed the values, you end up with a union of your two sets because they have nothing in common.

You can fix this by explicitly implementing the hash(into:) function and == functions to ignore the value when checking for equality:

struct Option: Hashable, CustomStringConvertible {
var title: String
var key: String
var value: Int
var description: String { return "{title: \"\(title)\", key: \"\(key)\", value: \(value)}" }

func hash(into hasher: inout Hasher) {
hasher.combine(title)
hasher.combine(key)
}

static func ==(lhs: Option, rhs: Option) -> Bool {
return lhs.title == rhs.title && lhs.key == rhs.key
}
}

var apple = Option(title: "Apple", key: "apple", value: 0)
var grape = Option(title: "Grape", key: "grape", value: 0)
var banana = Option(title: "Banana", key: "banana", value: 0)
var papaya = Option(title: "Papaya", key: "papaya", value: 0)

var options = [apple, grape, banana, papaya]

apple.value = 1
grape.value = 2
var ranked = [apple, grape]

let originalSet: Set<Option> = Set(options)
var rankedSet: Set<Option> = Set(ranked)

let remainingOptions = originalSet.symmetricDifference(rankedSet)
print(remainingOptions)
[{title: "Papaya", key: "papaya", value: 0}, {title: "Banana", key: "banana", value: 0}]

Note: symmetricDifference takes a sequence, so it isn't necessary to convert ranked into a Set, you can just use the array:

let remainingOptions = originalSet.symmetricDifference(ranked)

Another Option: Use Filter

Instead of using Sets and symmetricDifference, you can use map to get an array of keys from the ranked array, and then use filter on the options array to get the remaining options that don't match those keys:

let rankedKeys = ranked.map { $0.key }
let remaining = options.filter { !rankedKeys.contains($0.key) }

This doesn't require you to change the Option struct from your original definition.

How to write to an Element in a Set?

You are getting the error because columns might be a set of struct. So columns.first will give you an immutable value. If you were to use a class, you will get a mutable result from columns.first and your code will work as expected.
Otherwise, you will have to do as explained by @Sweeper in his answer.

How to extend Array to dedup elements using Set?

deduped() must be an instance method, and – more descriptive and not related to the issue – return [Element] rather than Array

extension Array where Element: Hashable {

func deduped() -> [Element] {
return Array(Set(self))
}
}

However here are more efficient ways to remove duplicates, my favorite is

extension Array where Element: Hashable {
func deduped() -> [Element] {
var seen = Set<Element>()
return filter{ seen.insert($0).inserted }
}
}

Removing duplicate elements from an array in Swift

You can roll your own, e.g. like this:

func unique<S : Sequence, T : Hashable>(source: S) -> [T] where S.Iterator.Element == T {
var buffer = [T]()
var added = Set<T>()
for elem in source {
if !added.contains(elem) {
buffer.append(elem)
added.insert(elem)
}
}
return buffer
}

let vals = [1, 4, 2, 2, 6, 24, 15, 2, 60, 15, 6]
let uniqueVals = uniq(vals) // [1, 4, 2, 6, 24, 15, 60]

And as an extension for Array:

extension Array where Element: Hashable {
func uniqued() -> Array {
var buffer = Array()
var added = Set<Element>()
for elem in self {
if !added.contains(elem) {
buffer.append(elem)
added.insert(elem)
}
}
return buffer
}
}

Or more elegantly (Swift 4/5):

extension Sequence where Element: Hashable {
func uniqued() -> [Element] {
var set = Set<Element>()
return filter { set.insert($0).inserted }
}
}

Which would be used:

[1,2,4,2,1].uniqued()  // => [1,2,4]

Find index of all duplicate elements in NSArray

You can fetch the index of duplicates like this:

NSArray *array=@[@"one",@"one",@"one",@"two",@"two",@"four",@"four",@"four"];
[array enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop)
{
if ([obj isEqualToString:@"one"])
{
NSLog(@"index %d",idx);

}
}];

The best way to remove duplicate values from NSMutableArray in Objective-C?

Your NSSet approach is the best if you're not worried about the order of the objects, but then again, if you're not worried about the order, then why aren't you storing them in an NSSet to begin with?

I wrote the answer below in 2009; in 2011, Apple added NSOrderedSet to iOS 5 and Mac OS X 10.7. What had been an algorithm is now two lines of code:

NSOrderedSet *orderedSet = [NSOrderedSet orderedSetWithArray:yourArray];
NSArray *arrayWithoutDuplicates = [orderedSet array];

If you are worried about the order and you're running on iOS 4 or earlier, loop over a copy of the array:

NSArray *copy = [mutableArray copy];
NSInteger index = [copy count] - 1;
for (id object in [copy reverseObjectEnumerator]) {
if ([mutableArray indexOfObject:object inRange:NSMakeRange(0, index)] != NSNotFound) {
[mutableArray removeObjectAtIndex:index];
}
index--;
}
[copy release];

Best way to remove duplicate objects from multi dimension MutableArray objective-C

You can use the property of NSSets that they will only store a single instance of a given value.

  • First convert each sub-array into a set, that will remove duplicates that are in that array.

  • Then, subtract the set of items that have already been seen.

  • Convert the result back to an array and add it to your output array.

  • Finally, add the items in that sub-array to the set of items that have already been seen.

Something like:

-(NSArray *)removeDuplicatesFromArray:(NSArray *)array {

NSMutableArray *returnArray=[NSMutableArray new];
NSMutableSet *cumulativeSet=[NSMutableSet new];

for (NSArray *innerArray in array) {
NSMutableSet *innerSet = [NSMutableSet setWithArray:innerArray];
[innerSet minusSet:cumulativeSet];
[cumulativeSet unionSet:innerSet];

[returnArray addObject:[innerSet allObjects]];
}

return [returnArray copy];
}


Related Topics



Leave a reply



Submit