Complexity of *in* operator in Python
The complexity of in
depends entirely on what L
is. e in L
will become L.__contains__(e)
.
See this time complexity document for the complexity of several built-in types.
Here is the summary for in
:
- list - Average: O(n)
- set/dict - Average: O(1), Worst: O(n)
The O(n) worst case for sets and dicts is very uncommon, but it can happen if __hash__
is implemented poorly. This only happens if everything in your set has the same hash value.
in' operator functionality in python
in
does not necessarily use loops behind the scenes. For example:
r = range(100000000000)
print(333 in r) # prints True immediately without looping
If you were to loop r
it will take quite a long time, so clearly that doesn't happen.
in
basically calls (behind the scenes) the object's __contains__
method. For some iterators it will in fact "loop" through everything but that's not always the case.
This example is basically the same as calling:
r.__contains__(333)
As pointed out in comments - the str
objects specifically have a smarter algorithm than just plain loops, as you can see here
Also see example answer here
And see the documentation here
Because the real world scenario would probably mean that string1
can be arbitrarily long, but the characters to be removed would be a finite and small set, it will probably be much more efficient to just add up all the characters that aren't in string2
. Something like this:
def removeChars (string1, string2):
result = ''
for char in string1:
if char not in string2:
result += char
return result
This will involve looping over string1
just once, but multiple checks against string2
using in
. This can be further simplified (to avoid +=
loop over the result):
def removeChars (string1, string2):
return ''.join(char for char in string1 if char not in string2)
Time Complexity of in (containment operator)
Your analysis is correct.
- List containment is O(n), and doing an O(n) operation O(n) times is O(n2).
- Dictionary lookups are O(1), and doing an O(1) operation O(n) times is O(n).
Python in operator time complexity on range()
In python-3.x a range(..)
is an object, it does not construct a list. If you perform member checks with an int
as item, then it can do that quite fast. The algorithm is a bit complicated, since there are both positive and negative steps. You can look it up on [GitHub]. A simple algorithm with a positive step count (c > 0
) for x in range(a, b, c)
is something like:
x ≥ a ∧ x < b ∧ mod(x-a, c) = 0.
For a negative step count (c < 0
) is is quite similar:
x ≤ a ∧ x > b ∧ mod(x-a, c) = 0.
If we consider the comparisons to be done in O(1) and calculating the modulo as well, it is an O(1) algorithm. In reality for huge numbers, it scales in the number of digits of the numbers, so it is an O(log n) algorithm.
This however only works for int
s. Indeed, in case you use float
s, complex
, other numerical or non-numerical types, it does not perform the above calculations. It will then fall back on iterating over the range(..)
object. Which of course can take considerable time. If you have a range of a million elements, it will thus iterate over all these elements and eventually reach the end of the range, and return False
, or find a match, and return True
. In the future, perhaps some extra functionality can be implemented for some numerical types, but one can not do this in general, since you can define your own numerical type with an equality operation that works differently.
In python-2.x, range(..)
is a function that returns a list. Indeed:
>>> range(15)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
>>> type(range(15))
<type 'list'>
In order to check if an element is a member of a list, it will iterate over the list, and check equality for all items until it finds an element that is equal, or the list is exhausted. If we consider checking if two items are equal to be constant time, then this takes linear time O(n). In reality for huge numbers, checking if two numbers are equal, scales linear with the number of "digits", so O(log m) with m the value of that number.
python-2.x has an xrange(..)
object as well, but this does not check for membership with the above demonstrated trick.
Python string 'in' operator implementation algorithm and time complexity
It's a combination of Boyer-Moore and Horspool.
You can view the C code here:
Fast search/count implementation, based on a mix between Boyer-Moore and Horspool, with a few more bells and whistles on the top. For some more background, see: https://web.archive.org/web/20201107074620/http://effbot.org/zone/stringlib.htm.
From the link above:
When designing the new algorithm, I used the following constraints:
- should be faster than the current brute-force algorithm for all test cases (based on real-life code), including Jim Hugunin’s worst-case test
- small setup overhead; no dynamic allocation in the fast path (O(m) for speed, O(1) for storage)
- sublinear search behaviour in good cases (O(n/m))
- no worse than the current algorithm in worst case (O(nm))
- should work well for both 8-bit strings and 16-bit or 32-bit Unicode strings (no O(σ) dependencies)
- many real-life searches should be good, very few should be worst case
- reasonably simple implementation
What is Time complexity of set and if item in array in Python?
The time complexity of the 'in' operator for sets in python is on average O(1) and only in the worst case O(N), since sets in python use a HashTable internally.
So your function's time complexity on average should be O(N) and only in the worst case will be O(N^2), where N is the length of the array.
More here https://wiki.python.org/moin/TimeComplexity
What is the time complexity of bitwise operations in python?
You really give all the ingredients for the answer.
Quoting from a blog by Victor Skvortsov:
Everything related to the representation of Python integers can be found in
Include/longintrepr.h
. Technically, Python integers are instances ofPyLongObject
, which is defined inInclude/longobject.h
, but PyLongObject is actually a typedef forstruct _longobject
that is defined inInclude/longintrepr.h
:struct _longobject {
PyVarObject ob_base; // expansion of PyObject_VAR_HEAD macro
digit ob_digit[1];
};This struct extends
PyVarObject
, which in turn extendsPyObject
:typedef struct {
PyObject ob_base;
Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;So, besides a reference count and a type that all Python objects have,
an integer object has two other members:
ob_size
that comes fromPyVarObject
; andob_digit
that is defined instruct _longobject
.The
ob_digit
member is a pointer to an array of digits. On 64-bit platforms, each digit is a 30-bit integer that takes values between 0 and 2^30-1 and is stored as an unsigned 32-bit int (digit
is a typedef foruint32_t
). On 32-bit platforms, each digit is a 15-bit integer that takes values between 0 and 2^15-1 and is stored as an unsigned 16-bit int (digit
is a typedef forunsigned short
).
This confirms what you wrote, i.e. that Python integers are represented as an array of fixed sized words.
So bit operations will have a time complexity that is O(k) where k is the total size of such array(s). Or, if we want to express this in terms of the value n of the integer involved, it is O(logn).
Related Topics
How to Login to a Website with Python
How to Disable a Pylint Warning
"Importerror: No Module Named" When Trying to Run Python Script
How to Save an Image Locally Using Python Whose Url Address I Already Know
Django Rest Framework File Upload
How Does Numpy's Transpose() Method Permute the Axes of an Array
How to Call a Shell Script from Python Code
How to Create PDF Files in Python
Django Modelform for Many-To-Many Fields
Unicodedecodeerror When Redirecting to File
How to Display Full Output in Jupyter, Not Only Last Result
How to Install MySQLdb (Python Data Access Library to MySQL) on MAC Os X
How to Use 'Else' in a List Comprehension
Heatmap in Matplotlib with Pcolor
How to Disable Logging on the Standard Error Stream
How to Install Pycrypto on Windows
How to Properly Assert That an Exception Gets Raised in Pytest