How to Test the Membership of Multiple Values in a List

How to test the membership of multiple values in a list

This does what you want, and will work in nearly all cases:

>>> all(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])
True

The expression 'a','b' in ['b', 'a', 'foo', 'bar'] doesn't work as expected because Python interprets it as a tuple:

>>> 'a', 'b'
('a', 'b')
>>> 'a', 5 + 2
('a', 7)
>>> 'a', 'x' in 'xerxes'
('a', True)

Other Options

There are other ways to execute this test, but they won't work for as many different kinds of inputs. As Kabie points out, you can solve this problem using sets...

>>> set(['a', 'b']).issubset(set(['a', 'b', 'foo', 'bar']))
True
>>> {'a', 'b'} <= {'a', 'b', 'foo', 'bar'}
True

...sometimes:

>>> {'a', ['b']} <= {'a', ['b'], 'foo', 'bar'}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Sets can only be created with hashable elements. But the generator expression all(x in container for x in items) can handle almost any container type. The only requirement is that container be re-iterable (i.e. not a generator). items can be any iterable at all.

>>> container = [['b'], 'a', 'foo', 'bar']
>>> items = (i for i in ('a', ['b']))
>>> all(x in [['b'], 'a', 'foo', 'bar'] for x in items)
True

Speed Tests

In many cases, the subset test will be faster than all, but the difference isn't shocking -- except when the question is irrelevant because sets aren't an option. Converting lists to sets just for the purpose of a test like this won't always be worth the trouble. And converting generators to sets can sometimes be incredibly wasteful, slowing programs down by many orders of magnitude.

Here are a few benchmarks for illustration. The biggest difference comes when both container and items are relatively small. In that case, the subset approach is about an order of magnitude faster:

>>> smallset = set(range(10))
>>> smallsubset = set(range(5))
>>> %timeit smallset >= smallsubset
110 ns ± 0.702 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> %timeit all(x in smallset for x in smallsubset)
951 ns ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This looks like a big difference. But as long as container is a set, all is still perfectly usable at vastly larger scales:

>>> bigset = set(range(100000))
>>> bigsubset = set(range(50000))
>>> %timeit bigset >= bigsubset
1.14 ms ± 13.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit all(x in bigset for x in bigsubset)
5.96 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using subset testing is still faster, but only by about 5x at this scale. The speed boost is due to Python's fast c-backed implementation of set, but the fundamental algorithm is the same in both cases.

If your items are already stored in a list for other reasons, then you'll have to convert them to a set before using the subset test approach. Then the speedup drops to about 2.5x:

>>> %timeit bigset >= set(bigsubseq)
2.1 ms ± 49.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

And if your container is a sequence, and needs to be converted first, then the speedup is even smaller:

>>> %timeit set(bigseq) >= set(bigsubseq)
4.36 ms ± 31.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The only time we get disastrously slow results is when we leave container as a sequence:

>>> %timeit all(x in bigseq for x in bigsubseq)
184 ms ± 994 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

And of course, we'll only do that if we must. If all the items in bigseq are hashable, then we'll do this instead:

>>> %timeit bigset = set(bigseq); all(x in bigset for x in bigsubseq)
7.24 ms ± 78 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

That's just 1.66x faster than the alternative (set(bigseq) >= set(bigsubseq), timed above at 4.36).

So subset testing is generally faster, but not by an incredible margin. On the other hand, let's look at when all is faster. What if items is ten-million values long, and is likely to have values that aren't in container?

>>> %timeit hugeiter = (x * 10 for bss in [bigsubseq] * 2000 for x in bss); set(bigset) >= set(hugeiter)
13.1 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit hugeiter = (x * 10 for bss in [bigsubseq] * 2000 for x in bss); all(x in bigset for x in hugeiter)
2.33 ms ± 65.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Converting the generator into a set turns out to be incredibly wasteful in this case. The set constructor has to consume the entire generator. But the short-circuiting behavior of all ensures that only a small portion of the generator needs to be consumed, so it's faster than a subset test by four orders of magnitude.

This is an extreme example, admittedly. But as it shows, you can't assume that one approach or the other will be faster in all cases.

The Upshot

Most of the time, converting container to a set is worth it, at least if all its elements are hashable. That's because in for sets is O(1), while in for sequences is O(n).

On the other hand, using subset testing is probably only worth it sometimes. Definitely do it if your test items are already stored in a set. Otherwise, all is only a little slower, and doesn't require any additional storage. It can also be used with large generators of items, and sometimes provides a massive speedup in that case.

Testing if multiple objects are in a list using one in statement (Python)

The answers are correct (at least one of them is). However, if you're doing containment checks and don't care about order, like your example might suggest, the real answer is that you should be using sets, and checking for subsets.

words = {"the", "set", "of", "words"}
if words <= set_of_words:
do_stuff()

Simplest way to check if multiple items are (or are not) in a list?

any and all can be used to check multiple boolean expressions.

a = [1, 2, 3, 4, 5]
b = [1, 2, 4]

print(all(i in a for i in b)) # Checks if all items are in the list
print(any(i in a for i in b)) # Checks if any item is in the list

Checking a list for one of multiple values using in statement

(1 or 2 or 3) in [1, 2, 3] is equivalent to

[1, 2, 3].__contains__(1 or 2 or 3)

not

[1,2,3].__contains__(1) or [1,2,3].__contains__(2) or [1,2,3].__contains__(3)

. The expression 1 or 2 or 3 is evaluated to 1 before __contains__ actually gets called. x or y evaluates to x if bool(x) is true, otherwise it evaluates to y. (Since or is not a regular operator or function, y is only evaluated if bool(x) is false.)

To do what you want, you can use the any function:

print(any(x in [1,2,3] for x in (1, 2, 3)))

How to check if any of multiple elements are in a List in a convenient way?

If you're on java 8 and higher, you can use the following code :

if (javaList.stream().anyMatch(l -> l.matches("aaa|xyz|abc")))

You can use .matches() and pass in the Strings separated by the OR symbol. The .matches() takes in a String, represented as a regex.

Access multiple elements of list knowing their index

You can use operator.itemgetter:

from operator import itemgetter 
a = [-2, 1, 5, 3, 8, 5, 6]
b = [1, 2, 5]
print(itemgetter(*b)(a))
# Result:
(1, 5, 5)

Or you can use numpy:

import numpy as np
a = np.array([-2, 1, 5, 3, 8, 5, 6])
b = [1, 2, 5]
print(list(a[b]))
# Result:
[1, 5, 5]

But really, your current solution is fine. It's probably the neatest out of all of them.

Multiple value checks using 'in' operator (Python)

if any(s in line for s in ('string1', 'string2', ...)):

How to test multiple variables for equality against a single value?

You misunderstand how boolean expressions work; they don't work like an English sentence and guess that you are talking about the same comparison for all names here. You are looking for:

if x == 1 or y == 1 or z == 1:

x and y are otherwise evaluated on their own (False if 0, True otherwise).

You can shorten that using a containment test against a tuple:

if 1 in (x, y, z):

or better still:

if 1 in {x, y, z}:

using a set to take advantage of the constant-cost membership test (i.e. in takes a fixed amount of time whatever the left-hand operand is).

Explanation

When you use or, python sees each side of the operator as separate expressions. The expression x or y == 1 is treated as first a boolean test for x, then if that is False, the expression y == 1 is tested.

This is due to operator precedence. The or operator has a lower precedence than the == test, so the latter is evaluated first.

However, even if this were not the case, and the expression x or y or z == 1 was actually interpreted as (x or y or z) == 1 instead, this would still not do what you expect it to do.

x or y or z would evaluate to the first argument that is 'truthy', e.g. not False, numeric 0 or empty (see boolean expressions for details on what Python considers false in a boolean context).

So for the values x = 2; y = 1; z = 0, x or y or z would resolve to 2, because that is the first true-like value in the arguments. Then 2 == 1 would be False, even though y == 1 would be True.

The same would apply to the inverse; testing multiple values against a single variable; x == 1 or 2 or 3 would fail for the same reasons. Use x == 1 or x == 2 or x == 3 or x in {1, 2, 3}.

How to check if one of the following items is in a list?

>>> L1 = [2,3,4]
>>> L2 = [1,2]
>>> [i for i in L1 if i in L2]
[2]

>>> S1 = set(L1)
>>> S2 = set(L2)
>>> S1.intersection(S2)
set([2])

Both empty lists and empty sets are False, so you can use the value directly as a truth value.



Related Topics



Leave a reply



Submit