Python Find Elements in One List That Are Not in the Other

Python find elements in one list that are not in the other

TL;DR:

SOLUTION (1)

import numpy as np
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`

SOLUTION (2) You want a sorted list

def setdiff_sorted(array1,array2,assume_unique=False):
ans = np.setdiff1d(array1,array2,assume_unique).tolist()
if assume_unique:
return sorted(ans)
return ans
main_list = setdiff_sorted(list_2,list_1)



EXPLANATIONS:

(1) You can use NumPy's setdiff1d (array1,array2,assume_unique=False).

assume_unique asks the user IF the arrays ARE ALREADY UNIQUE.
If False, then the unique elements are determined first.

If True, the function will assume that the elements are already unique AND function will skip determining the unique elements.

This yields the unique values in array1 that are not in array2. assume_unique is False by default.

If you are concerned with the unique elements (based on the response of Chinny84), then simply use (where assume_unique=False => the default value):

import numpy as np
list_1 = ["a", "b", "c", "d", "e"]
list_2 = ["a", "f", "c", "m"]
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`


(2)
For those who want answers to be sorted, I've made a custom function:

import numpy as np
def setdiff_sorted(array1,array2,assume_unique=False):
ans = np.setdiff1d(array1,array2,assume_unique).tolist()
if assume_unique:
return sorted(ans)
return ans

To get the answer, run:

main_list = setdiff_sorted(list_2,list_1)

SIDE NOTES:

(a) Solution 2 (custom function setdiff_sorted) returns a list (compared to an array in solution 1).

(b) If you aren't sure if the elements are unique, just use the default setting of NumPy's setdiff1d in both solutions A and B. What can be an example of a complication? See note (c).

(c) Things will be different if either of the two lists is not unique.

Say list_2 is not unique: list2 = ["a", "f", "c", "m", "m"]. Keep list1 as is: list_1 = ["a", "b", "c", "d", "e"]
Setting the default value of assume_unique yields ["f", "m"] (in both solutions). HOWEVER, if you set assume_unique=True, both solutions give ["f", "m", "m"]. Why? This is because the user ASSUMED that the elements are unique). Hence, IT IS BETTER TO KEEP assume_unique to its default value. Note that both answers are sorted.

pythonnumpy

How to find list items that are not in another list?

Can you try using numpy?

import numpy as np
list1 = [.....]
list2 = [.....]
diff = np.setdiff1d(list2,list1)

Find elements present in one list but not in another list (and vice versa)

It is because when you executed aList = [i for i in aList if i not in bList ] , you replaced the content of your aList from [1,2] to [1].

And hence, bList ended up holding both [2,3] because your aList is just [1] while executing bList = [i for i in bList if i not in aList ].

In order to make your logic work, you may store the aList and bList in different variable. For example:

aList = [1,2]
bList = [2,3]
aListCopy = [i for i in aList if i not in bList ]
bListCopy = [i for i in bList if i not in aList ]
print(aListCopy) # prints: [1]
print(bListCopy) # prints: [3]

However for your use-case, it is better to use set() to find element present in one list but not in another list. For example:

# Returns elements present in `aList` but not in `bList`
>>> set(aList) - set(bList)
set([1])

# Returns elements present in `bList` but not in `aList`
>>> set(bList) - set(aList)
set([3])

Please refer set() documentation for more details.

How do I efficiently find which elements of a list are in another list?

If you want to use a vector approach you can also use Numpy isin. It's not the fastest method, as demonstrated by oda's excellent post, but it's definitely an alternative to consider.

import numpy as np

list_1 = [0,0,1,2,0,0]
list_2 = [1,2,3,4,5,6]

a1 = np.array(list_1)
a2 = np.array(list_2)

np.isin(a1, a2)
# array([False, False, True, True, False, False])

Python find indices of elements in one list that are not in the other list

You can use a conditional list comprehension:

>>> [i for i, item in enumerate(list_1) if item not in list_2]
[1, 4]

This solution has a time complexity of O(n*m). For bigger lists, it makes sense to convert list_2 to a set as it is much faster to search in a set. The following solution is O(n):

>>> set_2 = set(list_2)
>>> [i for i, item in enumerate(list_1) if item not in set_2]
[1, 4]

Finding elements not in a list

Your code is not doing what I think you think it is doing. The line for item in z: will iterate through z, each time making item equal to one single element of z. The original item list is therefore overwritten before you've done anything with it.

I think you want something like this:

item = [0,1,2,3,4,5,6,7,8,9]

for element in item:
if element not in z:
print(element)

But you could easily do this like:

[x for x in item if x not in z]

or (if you don't mind losing duplicates of non-unique elements):

set(item) - set(z)

Python find elements of list 1 which are not in list 2 - simple code not working

What you observe is due to the fact that requirementWithCoverage contains elements that are not in allRequirements. Here is an example:

allRequirements         = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
requirementWithCoverage = [1, 7, 11, 12] # 11 and 12 are unexpectedly there
notCovered = list(set(allRequirements) - set(requirementWithCoverage))

print(len(allRequirements)) # 10
print(len(requirementWithCoverage)) # 4
print(len(notCovered)) # 8 (6 was expected)

You can confirm this by printing the returned value of set(requirementWithCoverage).issubset(set(allRequirements)) which should be False, whereas you expected it to be True.

And even better, you can print the unexpected elements of requirementWithCoverage through:

print(set(requirementWithCoverage) - set(allRequirements))

Python: Find in list

As for your first question: "if item is in my_list:" is perfectly fine and should work if item equals one of the elements inside my_list. The item must exactly match an item in the list. For instance, "abc" and "ABC" do not match. Floating point values in particular may suffer from inaccuracy. For instance, 1 - 1/3 != 2/3.

As for your second question: There's actually several possible ways if "finding" things in lists.

Checking if something is inside

This is the use case you describe: Checking whether something is inside a list or not. As you know, you can use the in operator for that:

3 in [1, 2, 3] # => True

Filtering a collection

That is, finding all elements in a sequence that meet a certain condition. You can use list comprehension or generator expressions for that:

matches = [x for x in lst if fulfills_some_condition(x)]
matches = (x for x in lst if x > 6)

The latter will return a generator which you can imagine as a sort of lazy list that will only be built as soon as you iterate through it. By the way, the first one is exactly equivalent to

matches = filter(fulfills_some_condition, lst)

in Python 2. Here you can see higher-order functions at work. In Python 3, filter doesn't return a list, but a generator-like object.

Finding the first occurrence

If you only want the first thing that matches a condition (but you don't know what it is yet), it's fine to use a for loop (possibly using the else clause as well, which is not really well-known). You can also use

next(x for x in lst if ...)

which will return the first match or raise a StopIteration if none is found. Alternatively, you can use

next((x for x in lst if ...), [default value])

Finding the location of an item

For lists, there's also the index method that can sometimes be useful if you want to know where a certain element is in the list:

[1,2,3].index(2) # => 1
[1,2,3].index(4) # => ValueError

However, note that if you have duplicates, .index always returns the lowest index:......

[1,2,3,2].index(2) # => 1

If there are duplicates and you want all the indexes then you can use enumerate() instead:

[i for i,x in enumerate([1,2,3,2]) if x==2] # => [1, 3]

How to check if one of the following items is in a list?

>>> L1 = [2,3,4]
>>> L2 = [1,2]
>>> [i for i in L1 if i in L2]
[2]

>>> S1 = set(L1)
>>> S2 = set(L2)
>>> S1.intersection(S2)
set([2])

Both empty lists and empty sets are False, so you can use the value directly as a truth value.

Python: Fastest way to find all elements in one large list but not in another

I really like set analysis, where you can do:

set(list2) - set(list1)

Putting list items in a set removes all duplicates & ordering. Set operations allow us to remove a set of items from another set, just with the - operator.

If the list is enormous, numpy is a bit faster.

import numpy as np
np.setdiff1d(list1, list2)


Related Topics



Leave a reply



Submit