Python: Find in List

Python: Find in list

As for your first question: "if item is in my_list:" is perfectly fine and should work if item equals one of the elements inside my_list. The item must exactly match an item in the list. For instance, "abc" and "ABC" do not match. Floating point values in particular may suffer from inaccuracy. For instance, 1 - 1/3 != 2/3.

As for your second question: There's actually several possible ways if "finding" things in lists.

Checking if something is inside

This is the use case you describe: Checking whether something is inside a list or not. As you know, you can use the in operator for that:

3 in [1, 2, 3] # => True

Filtering a collection

That is, finding all elements in a sequence that meet a certain condition. You can use list comprehension or generator expressions for that:

matches = [x for x in lst if fulfills_some_condition(x)]
matches = (x for x in lst if x > 6)

The latter will return a generator which you can imagine as a sort of lazy list that will only be built as soon as you iterate through it. By the way, the first one is exactly equivalent to

matches = filter(fulfills_some_condition, lst)

in Python 2. Here you can see higher-order functions at work. In Python 3, filter doesn't return a list, but a generator-like object.

Finding the first occurrence

If you only want the first thing that matches a condition (but you don't know what it is yet), it's fine to use a for loop (possibly using the else clause as well, which is not really well-known). You can also use

next(x for x in lst if ...)

which will return the first match or raise a StopIteration if none is found. Alternatively, you can use

next((x for x in lst if ...), [default value])

Finding the location of an item

For lists, there's also the index method that can sometimes be useful if you want to know where a certain element is in the list:

[1,2,3].index(2) # => 1
[1,2,3].index(4) # => ValueError

However, note that if you have duplicates, .index always returns the lowest index:......

[1,2,3,2].index(2) # => 1

If there are duplicates and you want all the indexes then you can use enumerate() instead:

[i for i,x in enumerate([1,2,3,2]) if x==2] # => [1, 3]

Finding the index of an item in a list

>>> ["foo", "bar", "baz"].index("bar")
1

Reference: Data Structures > More on Lists

Caveats follow

Note that while this is perhaps the cleanest way to answer the question as asked, index is a rather weak component of the list API, and I can't remember the last time I used it in anger. It's been pointed out to me in the comments that because this answer is heavily referenced, it should be made more complete. Some caveats about list.index follow. It is probably worth initially taking a look at the documentation for it:

list.index(x[, start[, end]])

Return zero-based index in the list of the first item whose value is equal to x. Raises a ValueError if there is no such item.

The optional arguments start and end are interpreted as in the slice notation and are used to limit the search to a particular subsequence of the list. The returned index is computed relative to the beginning of the full sequence rather than the start argument.

Linear time-complexity in list length

An index call checks every element of the list in order, until it finds a match. If your list is long, and you don't know roughly where in the list it occurs, this search could become a bottleneck. In that case, you should consider a different data structure. Note that if you know roughly where to find the match, you can give index a hint. For instance, in this snippet, l.index(999_999, 999_990, 1_000_000) is roughly five orders of magnitude faster than straight l.index(999_999), because the former only has to search 10 entries, while the latter searches a million:

>>> import timeit
>>> timeit.timeit('l.index(999_999)', setup='l = list(range(0, 1_000_000))', number=1000)
9.356267921015387
>>> timeit.timeit('l.index(999_999, 999_990, 1_000_000)', setup='l = list(range(0, 1_000_000))', number=1000)
0.0004404920036904514

Only returns the index of the first match to its argument

A call to index searches through the list in order until it finds a match, and stops there. If you expect to need indices of more matches, you should use a list comprehension, or generator expression.

>>> [1, 1].index(1)
0
>>> [i for i, e in enumerate([1, 2, 1]) if e == 1]
[0, 2]
>>> g = (i for i, e in enumerate([1, 2, 1]) if e == 1)
>>> next(g)
0
>>> next(g)
2

Most places where I once would have used index, I now use a list comprehension or generator expression because they're more generalizable. So if you're considering reaching for index, take a look at these excellent Python features.

Throws if element not present in list

A call to index results in a ValueError if the item's not present.

>>> [1, 1].index(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 2 is not in list

If the item might not be present in the list, you should either

  1. Check for it first with item in my_list (clean, readable approach), or
  2. Wrap the index call in a try/except block which catches ValueError (probably faster, at least when the list to search is long, and the item is usually present.)

Python: finding an element in a list

The best way is probably to use the list method .index.

For the objects in the list, you can do something like:

def __eq__(self, other):
return self.Value == other.Value

with any special processing you need.

You can also use a for/in statement with enumerate(arr)

Example of finding the index of an item that has value > 100.

for index, item in enumerate(arr):
if item > 100:
return index, item

Source

How to find the python list item that start with

You have several options, but most obvious are:

Using list comprehension with a condition:

result = [i for i in some_list if i.startswith('GFS01_')]

Using filter (which returns iterator)

result = filter(lambda x: x.startswith('GFS01_'), some_list)

How do I get the number of elements in a list (length of a list) in Python?

The len() function can be used with several different types in Python - both built-in types and library types. For example:

>>> len([1, 2, 3])
3

How to find all occurrences of an element in a list

You can use a list comprehension with enumerate:

indices = [i for i, x in enumerate(my_list) if x == "whatever"]

The iterator enumerate(my_list) yields pairs (index, item) for each item in the list. Using i, x as loop variable target unpacks these pairs into the index i and the list item x. We filter down to all x that match our criterion, and select the indices i of these elements.

Python find elements in one list that are not in the other

TL;DR:

SOLUTION (1)

import numpy as np
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`

SOLUTION (2) You want a sorted list

def setdiff_sorted(array1,array2,assume_unique=False):
ans = np.setdiff1d(array1,array2,assume_unique).tolist()
if assume_unique:
return sorted(ans)
return ans
main_list = setdiff_sorted(list_2,list_1)



EXPLANATIONS:

(1) You can use NumPy's setdiff1d (array1,array2,assume_unique=False).

assume_unique asks the user IF the arrays ARE ALREADY UNIQUE.
If False, then the unique elements are determined first.

If True, the function will assume that the elements are already unique AND function will skip determining the unique elements.

This yields the unique values in array1 that are not in array2. assume_unique is False by default.

If you are concerned with the unique elements (based on the response of Chinny84), then simply use (where assume_unique=False => the default value):

import numpy as np
list_1 = ["a", "b", "c", "d", "e"]
list_2 = ["a", "f", "c", "m"]
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`


(2)
For those who want answers to be sorted, I've made a custom function:

import numpy as np
def setdiff_sorted(array1,array2,assume_unique=False):
ans = np.setdiff1d(array1,array2,assume_unique).tolist()
if assume_unique:
return sorted(ans)
return ans

To get the answer, run:

main_list = setdiff_sorted(list_2,list_1)

SIDE NOTES:

(a) Solution 2 (custom function setdiff_sorted) returns a list (compared to an array in solution 1).

(b) If you aren't sure if the elements are unique, just use the default setting of NumPy's setdiff1d in both solutions A and B. What can be an example of a complication? See note (c).

(c) Things will be different if either of the two lists is not unique.

Say list_2 is not unique: list2 = ["a", "f", "c", "m", "m"]. Keep list1 as is: list_1 = ["a", "b", "c", "d", "e"]
Setting the default value of assume_unique yields ["f", "m"] (in both solutions). HOWEVER, if you set assume_unique=True, both solutions give ["f", "m", "m"]. Why? This is because the user ASSUMED that the elements are unique). Hence, IT IS BETTER TO KEEP assume_unique to its default value. Note that both answers are sorted.

pythonnumpy

Find object in list that has attribute equal to some value (that meets any condition)

next((x for x in test_list if x.value == value), None)

This gets the first item from the list that matches the condition, and returns None if no item matches. It's my preferred single-expression form.

However,

for x in test_list:
if x.value == value:
print("i found it!")
break

The naive loop-break version, is perfectly Pythonic -- it's concise, clear, and efficient. To make it match the behavior of the one-liner:

for x in test_list:
if x.value == value:
print("i found it!")
break
else:
x = None

This will assign None to x if you don't break out of the loop.

How do I efficiently find which elements of a list are in another list?

If you want to use a vector approach you can also use Numpy isin. It's not the fastest method, as demonstrated by oda's excellent post, but it's definitely an alternative to consider.

import numpy as np

list_1 = [0,0,1,2,0,0]
list_2 = [1,2,3,4,5,6]

a1 = np.array(list_1)
a2 = np.array(list_2)

np.isin(a1, a2)
# array([False, False, True, True, False, False])


Related Topics



Leave a reply



Submit