Check If Two Unordered Lists Are Equal

Check if two unordered lists are equal

Python has a built-in datatype for an unordered collection of (hashable) things, called a set. If you convert both lists to sets, the comparison will be unordered.

set(x) == set(y)

Documentation on set

EDIT: @mdwhatcott points out that you want to check for duplicates. set ignores these, so you need a similar data structure that also keeps track of the number of items in each list. This is called a multiset; the best approximation in the standard library is a collections.Counter:

>>> import collections
>>> compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
>>> 
>>> compare([1,2,3], [1,2,3,3])
False
>>> compare([1,2,3], [1,2,3])
True
>>> compare([1,2,3,3], [1,2,2,3])
False
>>>

How to efficiently compare two unordered lists (not sets)?

O(n): The Counter() method is best (if your objects are hashable):

def compare(s, t):
    return Counter(s) == Counter(t)

O(n log n): The sorted() method is next best (if your objects are orderable):

def compare(s, t):
    return sorted(s) == sorted(t)

O(n * n): If the objects are neither hashable, nor orderable, you can use equality:

def compare(s, t):
    t = list(t)   # make a mutable copy
    try:
        for elem in s:
            t.remove(elem)
    except ValueError:
        return False
    return not t

Determine if 2 lists have the same elements, regardless of order?

You can simply check whether the multisets with the elements of x and y are equal:

import collections
collections.Counter(x) == collections.Counter(y)

This requires the elements to be hashable; runtime will be in O(n), where n is the size of the lists.

If the elements are also unique, you can also convert to sets (same asymptotic runtime, may be a little bit faster in practice):

set(x) == set(y)

If the elements are not hashable, but sortable, another alternative (runtime in O(n log n)) is

sorted(x) == sorted(y)

If the elements are neither hashable nor sortable you can use the following helper function. Note that it will be quite slow (O(n²)) and should generally not be used outside of the esoteric case of unhashable and unsortable elements.

def equal_ignore_order(a, b):
    """ Use only when elements are neither hashable nor sortable! """
    unmatched = list(b)
    for element in a:
        try:
            unmatched.remove(element)
        except ValueError:
            return False
    return not unmatched

Check if two unordered lists are equal with certain conditions

Judging by your example you need to check if the second list is a subset of the first list. You can do this using issubset:

a = ['one', 'two', 'three']
b = ['one']

print(set(b).issubset(a))

Best method to check whether the contents of two lists are same?

You already have one of the better methods to determine if the content in both lists is identical.

If your condition is that the content must be the same, but the order is optional, then using sort() and comparing them is a perfectly good solution.

Or you could do a method that does not involve sorting both lists and then also comparing them. This assumes the lists contain ints. But something similar could be done for other data types.

Using Counter you do not need to sort them, and you can make sure they have the same number of each element.

>>> from collections import Counter
>>> a = [1,2,3,4]
>>> b = [4,3,2,1]
>>> Counter(a) == Counter(b)
True

SML - See if two unordered lists are equal (have same values, and the values appear the same amount of times)

I think you're overthinking this problem.

fun isEqual [] [] = true
  | isEqual [] _  = false
  | isEqual _  [] = false

This is good so far, and makes sense.

Now, if neither of the lists are empty, then let's pattern match that so that we can get to the head and tail of the first list.

fun isEqual [] [] = true
  | isEqual [] _  = false
  | isEqual _  [] = false
  | isEqual (x::xs) lst2 = ...

Now, if we remove the value x from lst2 and run the same function recursively on xs and what's left of lst2, we should be able to reduce down to a result.

So let's write a helper remove function. Your delete is on the right track, but it's reliant on an index, and indexes aren't really a natural fit with lists.

The option type is a good choice to represent a case where we try to remove a value that is not in a list. Getting NONE would tell us pretty handily that the two are not equal.

fun remove _ [] = NONE
  | remove v (x::xs) =
    if v = x then SOME xs
    else 
      case remove v xs of
        NONE => NONE
      | SOME xs => SOME (x::xs)

I'll let you use this to complete isEqual.

Of course, you could also just sort both lists, and then this becomes much more straightforward.

comparing two unordered lists and finding which elements are mismatched in pyspark

If retaining duplicate matches like row3 in your example is not necessary, then you can use double array_except with concat to get your output, and use 'when,otherwise' to get your boolean based on size of array produced.
(spark2.4+)

df.withColumn("result", F.concat(F.array_except("L1","L2"),F.array_except("L2","L1")))\
  .withColumn("Boolean", F.when(F.size("result")==0,F.lit(True)).otherwise(F.lit(False))).show(truncate=False)

+-----------------+------------------------+-------------+-------+
|L1               |L2                      |result       |Boolean|
+-----------------+------------------------+-------------+-------+
|[one, two, three]|[one, two, three]       |[]           |true   |
|[one, two, three]|[one, three, two]       |[]           |true   |
|[one, two, three]|[one, two, three, three]|[]           |true   |
|[one, two, three]|[one, two, three, four] |[four]       |false  |
|[one, two, three]|[one, two, four]        |[three, four]|false  |
|[one, two, three]|[one]                   |[two, three] |false  |
+-----------------+------------------------+-------------+-------+

You wouldn't need to drop duplicates before compare as array_except does that for you automatically.

How do I test for the equality of two unordered lists?

If you know that there can never be duplicates, you can use a set (HashSet or BTreeSet, depending on your types):

use std::{collections::HashSet, hash::Hash};

fn my_eq<T>(a: &[T], b: &[T]) -> bool
where
    T: Eq + Hash,
{
    let a: HashSet<_> = a.iter().collect();
    let b: HashSet<_> = b.iter().collect();

    a == b
}

fn main() {
    assert!(my_eq(
        &["foo", "bar", "baz", "beh"],
        &["beh", "foo", "baz", "bar"]
    ));
    assert!(!my_eq(
        &["beh", "foo", "baz", "bar"],
        &["beh", "foo", "baz", "baz"]
    ));
}

If you need to handle duplicates, you'll want to count the number of values too:

use std::{collections::HashMap, hash::Hash};

fn my_eq<T>(a: &[T], b: &[T]) -> bool
where
    T: Eq + Hash,
{
    fn count<T>(items: &[T]) -> HashMap<&T, usize>
    where
        T: Eq + Hash,
    {
        let mut cnt = HashMap::new();
        for i in items {
            *cnt.entry(i).or_insert(0) += 1
        }
        cnt
    }

    count(a) == count(b)
}

fn main() {
    assert!(my_eq(
        &["foo", "foo", "baz", "beh"],
        &["beh", "foo", "baz", "foo"]
    ));
    assert!(!my_eq(
        &["foo", "foo", "baz", "beh"],
        &["beh", "foo", "baz"]
    ));
}

If you want to be super fancy, you can create a newtype that adds this type of equality directly:

use std::{collections::HashMap, hash::Hash};

#[derive(Debug, Copy, Clone)]
struct CustomEq<'a, T: 'a>(&'a [T]);

impl<'a, T> CustomEq<'a, T>
where
    T: Eq + Hash,
{
    fn count(&self) -> HashMap<&T, usize> {
        let mut cnt = HashMap::new();
        for i in self.0 {
            *cnt.entry(i).or_insert(0) += 1
        }
        cnt
    }
}

impl<'a, T> PartialEq for CustomEq<'a, T>
where
    T: Eq + Hash,
{
    fn eq(&self, other: &Self) -> bool {
        self.count() == other.count()
    }
}

fn main() {
    assert_eq!(
        CustomEq(&["foo", "bar", "baz", "beh"]),
        CustomEq(&["beh", "foo", "baz", "bar"])
    );
    assert_ne!(
        CustomEq(&["beh", "foo", "baz", "bar"]),
        CustomEq(&["beh", "foo", "baz", "baz"])
    );
}

How do I compare two unordered lists of lists with unordered lists inside of them?

You could simply sort the elements in the sublists, sort the sublists and compare the two lists.

>>> dummy_list_A =  [['A'], ['B'], ['C', 'D']]
>>> dummy_list_B =  [['B'], ['A'], ['D', 'C']]
>>> sorted(map(sorted, dummy_list_A))
[['A'], ['B'], ['C', 'D']]
>>> sorted(map(sorted, dummy_list_B))
[['A'], ['B'], ['C', 'D']]
>>> def signature(l):
...   return sorted(map(sorted, l))
... 
>>> signature(dummy_list_A) == signature(dummy_list_B)
True