Fastest way to find Indexes of item in list?
def find(target, myList):
for i in range(len(myList)):
if myList[i] == target:
yield i
def find_with_list(myList, target):
inds = []
for i in range(len(myList)):
if myList[i] == target:
inds += i,
return inds
In [8]: x = range(50)*200
In [9]: %timeit [i for i,j in enumerate(x) if j == 3]
1000 loops, best of 3: 598 us per loop
In [10]: %timeit list(find(3,x))
1000 loops, best of 3: 607 us per loop
In [11]: %timeit find(3,x)
1000000 loops, best of 3: 375 ns per loop
In [55]: %timeit find_with_list(x,3)
1000 loops, best of 3: 618 us per loop
Assuming you want a list as your output:
- All options seemed exhibit similar time performance for my test with the list comprehension being the fastest (barely).
If using a generator is acceptable, it's way faster than the other approaches. Though it doesn't account for actually iterating over the indices, nor does it store them, so the indices cannot be iterated over a second time.
Finding the index of an item in a list
>>> ["foo", "bar", "baz"].index("bar")
1
See the documentation for the built-in .index()
method of the list:
list.index(x[, start[, end]])
Return zero-based index in the list of the first item whose value is equal to x. Raises a
ValueError
if there is no such item.The optional arguments start and end are interpreted as in the slice notation and are used to limit the search to a particular subsequence of the list. The returned index is computed relative to the beginning of the full sequence rather than the start argument.
Caveats
Linear time-complexity in list length
An index
call checks every element of the list in order, until it finds a match. If the list is long, and if there is no guarantee that the value will be near the beginning, this can slow down the code.
This problem can only be completely avoided by using a different data structure. However, if the element is known to be within a certain part of the list, the start
and end
parameters can be used to narrow the search.
For example:
>>> import timeit
>>> timeit.timeit('l.index(999_999)', setup='l = list(range(0, 1_000_000))', number=1000)
9.356267921015387
>>> timeit.timeit('l.index(999_999, 999_990, 1_000_000)', setup='l = list(range(0, 1_000_000))', number=1000)
0.0004404920036904514
The second call is orders of magnitude faster, because it only has to search through 10 elements, rather than all 1 million.
Only the index of the first match is returned
A call to index
searches through the list in order until it finds a match, and stops there. If there could be more than one occurrence of the value, and all indices are needed, index
cannot solve the problem:
>>> [1, 1].index(1) # the `1` index is not found.
0
Instead, use a list comprehension or generator expression to do the search, with enumerate
to get indices:
>>> # A list comprehension gives a list of indices directly:
>>> [i for i, e in enumerate([1, 2, 1]) if e == 1]
[0, 2]
>>> # A generator comprehension gives us an iterable object...
>>> g = (i for i, e in enumerate([1, 2, 1]) if e == 1)
>>> # which can be used in a `for` loop, or manually iterated with `next`:
>>> next(g)
0
>>> next(g)
2
The list comprehension and generator expression techniques still work if there is only one match, and are more generalizable.
Raises an exception if there is no match
As noted in the documentation above, using .index
will raise an exception if the searched-for value is not in the list:
>>> [1, 1].index(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 2 is not in list
If this is a concern, either explicitly check first using item in my_list
, or handle the exception with try
/except
as appropriate.
The explicit check is simple and readable, but it must iterate the list a second time. See What is the EAFP principle in Python? for more guidance on this choice.
Fast way of getting index of match in list
Here's one possibility using match
:
a <- list(1:3, 4:5, 6:9)
b <- c(2, 3, 5, 8)
g <- rep(seq_along(a), sapply(a, length))
g[match(b, unlist(a))]
#> [1] 1 1 2 3
findInterval
is another option:
findInterval(match(b, unlist(a)), cumsum(c(0, sapply(a, length))) + 1)
#> [1] 1 1 2 3
For returning a list, try this:
a <- list(1:3, 4:5, 5:9)
b <- c(2, 3, 5, 8, 5)
g <- rep(seq_along(a), sapply(a, length))
aa <- unlist(a)
au <- unique(aa)
af <- factor(aa, levels = au)
gg <- split(g, af)
gg[match(b, au)]
Finding the indices of matching elements in list in Python
You are using .index()
which will only find the first occurrence of your value in the list. So if you have a value 1.0 at index 2, and at index 9, then .index(1.0)
will always return 2
, no matter how many times 1.0
occurs in the list.
Use enumerate()
to add indices to your loop instead:
def find(lst, a, b):
result = []
for i, x in enumerate(lst):
if x<a or x>b:
result.append(i)
return result
You can collapse this into a list comprehension:
def find(lst, a, b):
return [i for i, x in enumerate(lst) if x<a or x>b]
Fastest way to find matching index between two lists in python?
You can try like this. We know finding something in dictionary is fastest so the solution should use dictionary for the task completion.
In [1]: import re
In [2]: listA = ['123', '345', '678']
In [3]: listB = ['ABC123', 'CDE455', 'GHK678', 'CGH345']
In [4]: # Mapping b/w number in listB to related index
In [5]: mapping = {re.sub(r'\D+', '', value).strip(): index for index, value in enumerate(listB)}
In [6]: mapping # Print mapping dictionary
Out[6]: {'123': 0, '455': 1, '678': 2, '345': 3}
In [7]: # Find the desired output
In [8]: output = [mapping.get(item) for item in listA]
In [9]: output
Out[9]: [0, 3, 2]
In [10]:
Attached screenshot
How to efficiently find the indices of matching elements in two lists
Without duplicates
If your objects are hashable and your lists have no duplicates, you can create an inverted index of the first list and then traverse the second list. This traverses each list only once and thus is O(n)
.
def find_matching_index(list1, list2):
inverse_index = { element: index for index, element in enumerate(list1) }
return [(index, inverse_index[element])
for index, element in enumerate(list2) if element in inverse_index]
find_matching_index([1,2,3], [3,2,1]) # [(0, 2), (1, 1), (2, 0)]
With duplicates
You can extend the previous solution to account for duplicates. You can keep track of multiple indices with a set
.
def find_matching_index(list1, list2):
# Create an inverse index which keys are now sets
inverse_index = {}
for index, element in enumerate(list1):
if element not in inverse_index:
inverse_index[element] = {index}
else:
inverse_index[element].add(index)
# Traverse the second list
matching_index = []
for index, element in enumerate(list2):
# We have to create one pair by element in the set of the inverse index
if element in inverse_index:
matching_index.extend([(x, index) for x in inverse_index[element]])
return matching_index
find_matching_index([1, 1, 2], [2, 2, 1]) # [(2, 0), (2, 1), (0, 2), (1, 2)]
Unfortunately, this is no longer O(n). Consider the case where you input [1, 1]
and [1, 1]
, the output is [(0, 0), (0, 1), (1, 0), (1, 1)]
. Thus by the size of the output, the worst case cannot be better than O(n^2)
.
Although, this solution is still O(n)
if there are no duplicates.
Non-hashable objects
Now comes the case where your objects are not hashable, but comparable. The idea here will be to sort your lists in a way that preserves the origin index of each element. Then we can group sequences of elements that are equal to get matching indices.
Since we make heavy use of groupby
and product
in the following code, I made find_matching_index
return a generator for memory efficiency on long lists.
from itertools import groupby, product
def find_matching_index(list1, list2):
sorted_list1 = sorted((element, index) for index, element in enumerate(list1))
sorted_list2 = sorted((element, index) for index, element in enumerate(list2))
list1_groups = groupby(sorted_list1, key=lambda pair: pair[0])
list2_groups = groupby(sorted_list2, key=lambda pair: pair[0])
for element1, group1 in list1_groups:
try:
element2, group2 = next(list2_groups)
while element1 > element2:
(element2, _), group2 = next(list2_groups)
except StopIteration:
break
if element2 > element1:
continue
indices_product = product((i for _, i in group1), (i for _, i in group2), repeat=1)
yield from indices_product
# In version prior to 3.3, the above line must be
# for x in indices_product:
# yield x
list1 = [[], [1, 2], []]
list2 = [[1, 2], []]
list(find_matching_index(list1, list2)) # [(0, 1), (2, 1), (1, 0)]
It turns out that time complexity does not suffer that much. Sorting of course takes O(n log(n))
, but then groupby
provides generators that can recover all elements by traversing our lists only twice. The conclusion is that our complexity is primarly bound by the size of the output of product
. Thus giving a best case where the algorithm is O(n log(n))
and a worst case that is once again O(n^2)
.
How to find all occurrences of an element in a list
You can use a list comprehension with enumerate
:
indices = [i for i, x in enumerate(my_list) if x == "whatever"]
The iterator enumerate(my_list)
yields pairs (index, item)
for each item in the list. Using i, x
as loop variable target unpacks these pairs into the index i
and the list item x
. We filter down to all x
that match our criterion, and select the indices i
of these elements.
python - Fastest way to find index of an element in list by comparing to elements of another list
Check which of the pairs in cx
& cy
equal the pairs in x
& y
mask = (cx == x[:, None]) & (cy == y[:, None])
to get index of elements in cx
& cy
present in x
& y
use
np.vstack(np.where(mask))[1]
# outputs: array([1, 3], dtype=int64)
to get index of elements in x
& y
present in cx
& cy
use
np.vstack(np.where(mask))[0]
# outputs: array([0, 1], dtype=int64)
benchmarking code:
%timeit op(cx, cy, x, y)
44.6 µs ± 616 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit hal(cx, cy, x, y)
8.57 µs ± 90.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
speed up of 5.2x with sample data
# test methods
def op(cx, cy, x, y):
IndexList = np.zeros(len(x))
for i in range(len(x)):
px = np.where(cx == x[i])
py = np.where(cy == y[i])
Index = np.intersect1d(px, py)
IndexList[i] = Index
return IndexList
def hal(cx, cy, x, y):
mask = ( cx == x[:, None] ) & ( cy == y[:, None] )
return np.vstack(np.where(mask))[1]
list match in python: get indices of a sub-list in a larger list
A fast method (when a
is a large list) would be using a dict to map values in a
to indices:
>>> index_dict = dict((value, idx) for idx,value in enumerate(a))
>>> [index_dict[x] for x in b]
[0, 2, 0]
This will take linear time in the average case, compared to using a.index
which would take quadratic time.
Related Topics
R Function Prcomp Fails with Na's Values Even Though Na's Are Allowed
Combine Voronoi Polygons and Maps
Ggplot2: Add P-Values to the Plot
How to Save Output from Ggforce::Facet_Grid_Paginate in Only One PDF
How to Include Svg Image in PDF Document Rendered by Rmarkdown
Creating a Sankey Diagram Using Networkd3 Package in R
Error Installing Packages from Github
R Stacked Bar Graph Plotting Geom_Text
Merging Data Frames with Different Number of Rows and Different Columns
Can't Connect to Local MySQL Server Through Socket Error When Using Ssh Tunel
Import Multiple Text Files in R and Assign Them Names from a Predetermined List
Keyboard Shortcut for Inserting Roxygen #' Comment Start
Get Name of X When Defining '(<-' Operator
Are Dataframe[ ,-1] and Dataframe[-1] the Same
Getting Both Column Counts and Proportions in the Same Table in R