Fastest way to search a list in python
Also note that the list of values I'll have won't have duplicate data and I don't actually care about the order it's in; I just need to be able to check for the existence of a value.
Don't use a list, use a set()
instead. It has exactly the properties you want, including a blazing fast in
test.
I've seen speedups of 20x and higher in places (mostly heavy number crunching) where one list was changed for a set.
What's the fastest way to locate a list element within a list in python?
No. Without iterating you cannot find it, unless the list is already sorted. You can improve your code like this, with enumerate
and list comprehension.
[index for index, item in enumerate(thelist) if item[0] == "332"]
This will give the indices of all the elements where the first element is 332
.
If you know that 332
occurs only once, you can do this
def getIndex():
for index, item in enumerate(thelist):
if item[0] == "332":
return index
What is the fastes way to find an item in a list in python?
Numpy searchsorted is usually involved in these cases:
np.searchsorted([1,2,8,9], 5) # Your case
> 2
np.searchsorted([1,2,8,9], (-1, 2, 100)) #Other cases
> array([0, 1, 4])
index in missing cases refers to the near right. If this is not your case, this can be modified in order to obtain the near left position.
Most efficient way for a lookup/search in a huge list (python)
Don't create a list
, create a set
. It does lookups in constant time.
If you don't want the memory overhead of a set then keep a sorted list and search through it with the bisect
module.
from bisect import bisect_left
def bi_contains(lst, item):
""" efficient `item in lst` for sorted lists """
# if item is larger than the last its not in the list, but the bisect would
# find `len(lst)` as the index to insert, so check that first. Else, if the
# item is in the list then it has to be at index bisect_left(lst, item)
return (item <= lst[-1]) and (lst[bisect_left(lst, item)] == item)
Fastest way to check if a list is present in a list of lists
Using a list comprehension
with set
.
Ex:
a=[[1,2,3,4,5,6],[7,8,9,10,11,12]]
b=[[5, 9, 25, 31, 33, 36],[7,8,9,10,11,12],[10, 13, 22, 24, 33, 44]]
setA = set(map(tuple, a))
setB = set(map(tuple, b))
print([i for i in setA if i not in setB])
Output:
[(1, 2, 3, 4, 5, 6)]
Python: Fastest way to find all elements in one large list but not in another
I really like set analysis, where you can do:
set(list2) - set(list1)
Putting list items in a set removes all duplicates & ordering. Set operations allow us to remove a set of items from another set, just with the -
operator.
If the list is enormous, numpy is a bit faster.
import numpy as np
np.setdiff1d(list1, list2)
Python searching a large list speed
Two things that might provide some small help:
1) Use the approach in this SO answer to read through your large file the most efficiently.
2) Change your code from
for x in headwordList:
m = SequenceMatcher(None, y.lower(), 1)
to
yLower = y.lower()
for x in headwordList:
m = SequenceMatcher(None, yLower, 1)
You're converting each sentence to lower 650,000 times. No need for that.
Searching a sorted list?
Python:
import bisect
def find_in_sorted_list(elem, sorted_list):
# https://docs.python.org/3/library/bisect.html
'Locate the leftmost value exactly equal to x'
i = bisect.bisect_left(sorted_list, elem)
if i != len(sorted_list) and sorted_list[i] == elem:
return i
return -1
def in_sorted_list(elem, sorted_list):
i = bisect.bisect_left(sorted_list, elem)
return i != len(sorted_list) and sorted_list[i] == elem
L = ["aaa", "bcd", "hello", "world", "zzz"]
print(find_in_sorted_list("hello", L)) # 2
print(find_in_sorted_list("hellu", L)) # -1
print(in_sorted_list("hellu", L)) # False
Fastest way to check if a value exists in a list
7 in a
Clearest and fastest way to do it.
You can also consider using a set
, but constructing that set from your list may take more time than faster membership testing will save. The only way to be certain is to benchmark well. (this also depends on what operations you require)
Related Topics
Numpy to Tfrecords: Is There a More Simple Way to Handle Batch Inputs from Tfrecords
Set Up Python Simplehttpserver on Windows
Efficiently Updating Database Using SQLalchemy Orm
Find the Indexes of All Regex Matches
Django Serializer Imagefield to Get Full Url
Split Views.Py in Several Files
Can Pandas Plot a Histogram of Dates
Attributeerror: Can Only Use .Dt Accessor with Datetimelike Values
How to Implement a Minimal Server for Ajax in Python
@Csrf_Exempt Does Not Work on Generic View Based Class
How to Resize an Image with Opencv2.0 and Python2.6
Do You Use the "Global" Statement in Python
Print Combining Strings and Numbers
Read Unicode Characters from Command-Line Arguments in Python 2.X on Windows
Python Multiple Inheritance Passing Arguments to Constructors Using Super
Using the Multiprocessing Module for Cluster Computing
How to Add Default Parameters to Functions When Using Type Hinting