Fast Way of Getting Index of Match in List

Fastest way to find Indexes of item in list?

def find(target, myList):
for i in range(len(myList)):
if myList[i] == target:
yield i

def find_with_list(myList, target):
inds = []
for i in range(len(myList)):
if myList[i] == target:
inds += i,
return inds

In [8]: x = range(50)*200
In [9]: %timeit [i for i,j in enumerate(x) if j == 3]
1000 loops, best of 3: 598 us per loop

In [10]: %timeit list(find(3,x))
1000 loops, best of 3: 607 us per loop
In [11]: %timeit find(3,x)
1000000 loops, best of 3: 375 ns per loop

In [55]: %timeit find_with_list(x,3)
1000 loops, best of 3: 618 us per loop

Assuming you want a list as your output:

  • All options seemed exhibit similar time performance for my test with the list comprehension being the fastest (barely).

If using a generator is acceptable, it's way faster than the other approaches. Though it doesn't account for actually iterating over the indices, nor does it store them, so the indices cannot be iterated over a second time.

Finding the index of an item in a list

>>> ["foo", "bar", "baz"].index("bar")
1

See the documentation for the built-in .index() method of the list:

list.index(x[, start[, end]])

Return zero-based index in the list of the first item whose value is equal to x. Raises a ValueError if there is no such item.

The optional arguments start and end are interpreted as in the slice notation and are used to limit the search to a particular subsequence of the list. The returned index is computed relative to the beginning of the full sequence rather than the start argument.

Caveats

Linear time-complexity in list length

An index call checks every element of the list in order, until it finds a match. If the list is long, and if there is no guarantee that the value will be near the beginning, this can slow down the code.

This problem can only be completely avoided by using a different data structure. However, if the element is known to be within a certain part of the list, the start and end parameters can be used to narrow the search.

For example:

>>> import timeit
>>> timeit.timeit('l.index(999_999)', setup='l = list(range(0, 1_000_000))', number=1000)
9.356267921015387
>>> timeit.timeit('l.index(999_999, 999_990, 1_000_000)', setup='l = list(range(0, 1_000_000))', number=1000)
0.0004404920036904514

The second call is orders of magnitude faster, because it only has to search through 10 elements, rather than all 1 million.

Only the index of the first match is returned

A call to index searches through the list in order until it finds a match, and stops there. If there could be more than one occurrence of the value, and all indices are needed, index cannot solve the problem:

>>> [1, 1].index(1) # the `1` index is not found.
0

Instead, use a list comprehension or generator expression to do the search, with enumerate to get indices:

>>> # A list comprehension gives a list of indices directly:
>>> [i for i, e in enumerate([1, 2, 1]) if e == 1]
[0, 2]
>>> # A generator comprehension gives us an iterable object...
>>> g = (i for i, e in enumerate([1, 2, 1]) if e == 1)
>>> # which can be used in a `for` loop, or manually iterated with `next`:
>>> next(g)
0
>>> next(g)
2

The list comprehension and generator expression techniques still work if there is only one match, and are more generalizable.

Raises an exception if there is no match

As noted in the documentation above, using .index will raise an exception if the searched-for value is not in the list:

>>> [1, 1].index(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 2 is not in list

If this is a concern, either explicitly check first using item in my_list, or handle the exception with try/except as appropriate.

The explicit check is simple and readable, but it must iterate the list a second time. See What is the EAFP principle in Python? for more guidance on this choice.

Fast way of getting index of match in list

Here's one possibility using match:

a <- list(1:3, 4:5, 6:9)
b <- c(2, 3, 5, 8)
g <- rep(seq_along(a), sapply(a, length))
g[match(b, unlist(a))]
#> [1] 1 1 2 3

findInterval is another option:

findInterval(match(b, unlist(a)), cumsum(c(0, sapply(a, length))) + 1)
#> [1] 1 1 2 3

For returning a list, try this:

a <- list(1:3, 4:5, 5:9)
b <- c(2, 3, 5, 8, 5)
g <- rep(seq_along(a), sapply(a, length))
aa <- unlist(a)
au <- unique(aa)
af <- factor(aa, levels = au)
gg <- split(g, af)
gg[match(b, au)]

Finding the indices of matching elements in list in Python

You are using .index() which will only find the first occurrence of your value in the list. So if you have a value 1.0 at index 2, and at index 9, then .index(1.0) will always return 2, no matter how many times 1.0 occurs in the list.

Use enumerate() to add indices to your loop instead:

def find(lst, a, b):
result = []
for i, x in enumerate(lst):
if x<a or x>b:
result.append(i)
return result

You can collapse this into a list comprehension:

def find(lst, a, b):
return [i for i, x in enumerate(lst) if x<a or x>b]

Fastest way to find matching index between two lists in python?

You can try like this. We know finding something in dictionary is fastest so the solution should use dictionary for the task completion.

In [1]: import re                                                                        

In [2]: listA = ['123', '345', '678']

In [3]: listB = ['ABC123', 'CDE455', 'GHK678', 'CGH345']

In [4]: # Mapping b/w number in listB to related index

In [5]: mapping = {re.sub(r'\D+', '', value).strip(): index for index, value in enumerate(listB)}

In [6]: mapping # Print mapping dictionary
Out[6]: {'123': 0, '455': 1, '678': 2, '345': 3}

In [7]: # Find the desired output

In [8]: output = [mapping.get(item) for item in listA]

In [9]: output
Out[9]: [0, 3, 2]

In [10]:

Attached screenshot

Sample Image

How to efficiently find the indices of matching elements in two lists

Without duplicates

If your objects are hashable and your lists have no duplicates, you can create an inverted index of the first list and then traverse the second list. This traverses each list only once and thus is O(n).

def find_matching_index(list1, list2):

inverse_index = { element: index for index, element in enumerate(list1) }

return [(index, inverse_index[element])
for index, element in enumerate(list2) if element in inverse_index]

find_matching_index([1,2,3], [3,2,1]) # [(0, 2), (1, 1), (2, 0)]

With duplicates

You can extend the previous solution to account for duplicates. You can keep track of multiple indices with a set.

def find_matching_index(list1, list2):

# Create an inverse index which keys are now sets
inverse_index = {}

for index, element in enumerate(list1):

if element not in inverse_index:
inverse_index[element] = {index}

else:
inverse_index[element].add(index)

# Traverse the second list
matching_index = []

for index, element in enumerate(list2):

# We have to create one pair by element in the set of the inverse index
if element in inverse_index:
matching_index.extend([(x, index) for x in inverse_index[element]])

return matching_index

find_matching_index([1, 1, 2], [2, 2, 1]) # [(2, 0), (2, 1), (0, 2), (1, 2)]

Unfortunately, this is no longer O(n). Consider the case where you input [1, 1] and [1, 1], the output is [(0, 0), (0, 1), (1, 0), (1, 1)]. Thus by the size of the output, the worst case cannot be better than O(n^2).

Although, this solution is still O(n) if there are no duplicates.

Non-hashable objects

Now comes the case where your objects are not hashable, but comparable. The idea here will be to sort your lists in a way that preserves the origin index of each element. Then we can group sequences of elements that are equal to get matching indices.

Since we make heavy use of groupby and product in the following code, I made find_matching_index return a generator for memory efficiency on long lists.

from itertools import groupby, product

def find_matching_index(list1, list2):
sorted_list1 = sorted((element, index) for index, element in enumerate(list1))
sorted_list2 = sorted((element, index) for index, element in enumerate(list2))

list1_groups = groupby(sorted_list1, key=lambda pair: pair[0])
list2_groups = groupby(sorted_list2, key=lambda pair: pair[0])

for element1, group1 in list1_groups:
try:
element2, group2 = next(list2_groups)
while element1 > element2:
(element2, _), group2 = next(list2_groups)

except StopIteration:
break

if element2 > element1:
continue

indices_product = product((i for _, i in group1), (i for _, i in group2), repeat=1)

yield from indices_product

# In version prior to 3.3, the above line must be
# for x in indices_product:
# yield x

list1 = [[], [1, 2], []]
list2 = [[1, 2], []]

list(find_matching_index(list1, list2)) # [(0, 1), (2, 1), (1, 0)]

It turns out that time complexity does not suffer that much. Sorting of course takes O(n log(n)), but then groupby provides generators that can recover all elements by traversing our lists only twice. The conclusion is that our complexity is primarly bound by the size of the output of product. Thus giving a best case where the algorithm is O(n log(n)) and a worst case that is once again O(n^2).

How to find all occurrences of an element in a list

You can use a list comprehension with enumerate:

indices = [i for i, x in enumerate(my_list) if x == "whatever"]

The iterator enumerate(my_list) yields pairs (index, item) for each item in the list. Using i, x as loop variable target unpacks these pairs into the index i and the list item x. We filter down to all x that match our criterion, and select the indices i of these elements.

python - Fastest way to find index of an element in list by comparing to elements of another list

Check which of the pairs in cx & cy equal the pairs in x & y

mask = (cx == x[:, None]) & (cy == y[:, None])

to get index of elements in cx & cy present in x & y use

np.vstack(np.where(mask))[1]
# outputs: array([1, 3], dtype=int64)

to get index of elements in x & y present in cx & cy use

np.vstack(np.where(mask))[0]
# outputs: array([0, 1], dtype=int64)

benchmarking code:

%timeit op(cx, cy, x, y)
44.6 µs ± 616 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit hal(cx, cy, x, y)
8.57 µs ± 90.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

speed up of 5.2x with sample data

# test methods
def op(cx, cy, x, y):
IndexList = np.zeros(len(x))
for i in range(len(x)):
px = np.where(cx == x[i])
py = np.where(cy == y[i])
Index = np.intersect1d(px, py)
IndexList[i] = Index
return IndexList

def hal(cx, cy, x, y):
mask = ( cx == x[:, None] ) & ( cy == y[:, None] )
return np.vstack(np.where(mask))[1]

list match in python: get indices of a sub-list in a larger list

A fast method (when a is a large list) would be using a dict to map values in a to indices:

>>> index_dict = dict((value, idx) for idx,value in enumerate(a))
>>> [index_dict[x] for x in b]
[0, 2, 0]

This will take linear time in the average case, compared to using a.index which would take quadratic time.



Related Topics



Leave a reply



Submit