Complexity of *In* Operator in Python

Complexity of *in* operator in Python

The complexity of in depends entirely on what L is. e in L will become L.__contains__(e).

See this time complexity document for the complexity of several built-in types.

Here is the summary for in:

  • list - Average: O(n)
  • set/dict - Average: O(1), Worst: O(n)

The O(n) worst case for sets and dicts is very uncommon, but it can happen if __hash__ is implemented poorly. This only happens if everything in your set has the same hash value.

in' operator functionality in python

in does not necessarily use loops behind the scenes. For example:

r = range(100000000000)
print(333 in r) # prints True immediately without looping

If you were to loop r it will take quite a long time, so clearly that doesn't happen.

in basically calls (behind the scenes) the object's __contains__ method. For some iterators it will in fact "loop" through everything but that's not always the case.

This example is basically the same as calling:

r.__contains__(333)

As pointed out in comments - the str objects specifically have a smarter algorithm than just plain loops, as you can see here

Also see example answer here

And see the documentation here

Because the real world scenario would probably mean that string1 can be arbitrarily long, but the characters to be removed would be a finite and small set, it will probably be much more efficient to just add up all the characters that aren't in string2. Something like this:

def removeChars (string1, string2):
result = ''
for char in string1:
if char not in string2:
result += char
return result

This will involve looping over string1 just once, but multiple checks against string2 using in. This can be further simplified (to avoid += loop over the result):

def removeChars (string1, string2):
return ''.join(char for char in string1 if char not in string2)

Time Complexity of in (containment operator)

Your analysis is correct.

  • List containment is O(n), and doing an O(n) operation O(n) times is O(n2).
  • Dictionary lookups are O(1), and doing an O(1) operation O(n) times is O(n).

Python in operator time complexity on range()

In python-3.x a range(..) is an object, it does not construct a list. If you perform member checks with an int as item, then it can do that quite fast. The algorithm is a bit complicated, since there are both positive and negative steps. You can look it up on [GitHub]. A simple algorithm with a positive step count (c > 0) for x in range(a, b, c) is something like:

x ≥ a ∧ x < b ∧ mod(x-a, c) = 0.

For a negative step count (c < 0) is is quite similar:

x ≤ a ∧ x > b ∧ mod(x-a, c) = 0.

If we consider the comparisons to be done in O(1) and calculating the modulo as well, it is an O(1) algorithm. In reality for huge numbers, it scales in the number of digits of the numbers, so it is an O(log n) algorithm.

This however only works for ints. Indeed, in case you use floats, complex, other numerical or non-numerical types, it does not perform the above calculations. It will then fall back on iterating over the range(..) object. Which of course can take considerable time. If you have a range of a million elements, it will thus iterate over all these elements and eventually reach the end of the range, and return False, or find a match, and return True. In the future, perhaps some extra functionality can be implemented for some numerical types, but one can not do this in general, since you can define your own numerical type with an equality operation that works differently.

In python-2.x, range(..) is a function that returns a list. Indeed:

>>> range(15)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
>>> type(range(15))
<type 'list'>

In order to check if an element is a member of a list, it will iterate over the list, and check equality for all items until it finds an element that is equal, or the list is exhausted. If we consider checking if two items are equal to be constant time, then this takes linear time O(n). In reality for huge numbers, checking if two numbers are equal, scales linear with the number of "digits", so O(log m) with m the value of that number.

python-2.x has an xrange(..) object as well, but this does not check for membership with the above demonstrated trick.

Python string 'in' operator implementation algorithm and time complexity

It's a combination of Boyer-Moore and Horspool.

You can view the C code here:

Fast search/count implementation, based on a mix between Boyer-Moore and Horspool, with a few more bells and whistles on the top. For some more background, see: https://web.archive.org/web/20201107074620/http://effbot.org/zone/stringlib.htm.

From the link above:

When designing the new algorithm, I used the following constraints:

  • should be faster than the current brute-force algorithm for all test cases (based on real-life code), including Jim Hugunin’s worst-case test
  • small setup overhead; no dynamic allocation in the fast path (O(m) for speed, O(1) for storage)
  • sublinear search behaviour in good cases (O(n/m))
  • no worse than the current algorithm in worst case (O(nm))
  • should work well for both 8-bit strings and 16-bit or 32-bit Unicode strings (no O(σ) dependencies)
  • many real-life searches should be good, very few should be worst case
  • reasonably simple implementation

What is Time complexity of set and if item in array in Python?

The time complexity of the 'in' operator for sets in python is on average O(1) and only in the worst case O(N), since sets in python use a HashTable internally.

So your function's time complexity on average should be O(N) and only in the worst case will be O(N^2), where N is the length of the array.

More here https://wiki.python.org/moin/TimeComplexity

What is the time complexity of bitwise operations in python?

You really give all the ingredients for the answer.

Quoting from a blog by Victor Skvortsov:

Everything related to the representation of Python integers can be found in Include/longintrepr.h. Technically, Python integers are instances of PyLongObject, which is defined in Include/longobject.h, but PyLongObject is actually a typedef for struct _longobject that is defined in Include/longintrepr.h:

struct _longobject {
PyVarObject ob_base; // expansion of PyObject_VAR_HEAD macro
digit ob_digit[1];
};

This struct extends PyVarObject, which in turn extends PyObject:

typedef struct {
PyObject ob_base;
Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;

So, besides a reference count and a type that all Python objects have,
an integer object has two other members:

  • ob_size that comes from PyVarObject; and
  • ob_digit that is defined in struct _longobject.

The ob_digit member is a pointer to an array of digits. On 64-bit platforms, each digit is a 30-bit integer that takes values between 0 and 2^30-1 and is stored as an unsigned 32-bit int (digit is a typedef for uint32_t). On 32-bit platforms, each digit is a 15-bit integer that takes values between 0 and 2^15-1 and is stored as an unsigned 16-bit int (digit is a typedef for unsigned short).

This confirms what you wrote, i.e. that Python integers are represented as an array of fixed sized words.

So bit operations will have a time complexity that is O(k) where k is the total size of such array(s). Or, if we want to express this in terms of the value n of the integer involved, it is O(logn).



Related Topics



Leave a reply



Submit