Why Is "1000000000000000 in Range(1000000000000001)" So Fast in Python 3

Why is 1000000000000000 in range(1000000000000001) so fast in Python 3?

The Python 3 range() object doesn't produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.

The object also implements the object.__contains__ hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.

From the range() object documentation:

The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).

So at a minimum, your range() object would do:

class my_range:
def __init__(self, start, stop=None, step=1, /):
if stop is None:
start, stop = 0, start
self.start, self.stop, self.step = start, stop, step
if step < 0:
lo, hi, step = stop, start, -step
else:
lo, hi = start, stop
self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

def __iter__(self):
current = self.start
if self.step < 0:
while current > self.stop:
yield current
current += self.step
else:
while current < self.stop:
yield current
current += self.step

def __len__(self):
return self.length

def __getitem__(self, i):
if i < 0:
i += self.length
if 0 <= i < self.length:
return self.start + i * self.step
raise IndexError('my_range object index out of range')

def __contains__(self, num):
if self.step < 0:
if not (self.stop < num <= self.start):
return False
else:
if not (self.start <= num < self.stop):
return False
return (num - self.start) % self.step == 0

This is still missing several things that a real range() supports (such as the .index() or .count() methods, hashing, equality testing, or slicing), but should give you an idea.

I also simplified the __contains__ implementation to only focus on integer tests; if you give a real range() object a non-integer value (including subclasses of int), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.


* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.

Strange speeds of `x in range(...)` checks

In CPython, the implementation first checks the value against both endpoints:

if (cmp1 == 1) { /* positive steps: start <= ob < stop */
cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
}

If either is false, it can exit immediately:

if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
result = 0;
goto end;
}

If both checks are true, though, it then has to check if the stride rejects the value. (For example, 2000 in range(1501, 2500) is true, but 2000 in range(1501, 2500, 2) is false.)

tmp1 = PyNumber_Subtract(ob, r->start);
if (tmp1 == NULL)
goto end;
tmp2 = PyNumber_Remainder(tmp1, r->step);
if (tmp2 == NULL)
goto end;
/* result = ((int(ob) - start) % step) == 0 */
result = PyObject_RichCompareBool(tmp2, zero, Py_EQ);

It's this last step that makes 2000 in range(1500, 2500) slower than either out-of-bound check.

(See https://github.com/python/cpython/blob/main/Objects/rangeobject.c#L381 for the full implementation.)

Python: how to print range a-z?

>>> import string
>>> string.ascii_lowercase[:14]
'abcdefghijklmn'
>>> string.ascii_lowercase[:14:2]
'acegikm'

To do the urls, you could use something like this

[i + j for i, j in zip(list_of_urls, string.ascii_lowercase[:14])]

What is the meaning of int(a[::-1]) in Python?

Assuming a is a string. The Slice notation in python has the syntax -

list[<start>:<stop>:<step>]

So, when you do a[::-1], it starts from the end towards the first taking each element. So it reverses a. This is applicable for lists/tuples as well.

Example -

>>> a = '1234'
>>> a[::-1]
'4321'

Then you convert it to int and then back to string (Though not sure why you do that) , that just gives you back the string.

Why does range(start, end) not include end?

Because it's more common to call range(0, 10) which returns [0,1,2,3,4,5,6,7,8,9] which contains 10 elements which equals len(range(0, 10)). Remember that programmers prefer 0-based indexing.

Also, consider the following common code snippet:

for i in range(len(li)):
pass

Could you see that if range() went up to exactly len(li) that this would be problematic? The programmer would need to explicitly subtract 1. This also follows the common trend of programmers preferring for(int i = 0; i < 10; i++) over for(int i = 0; i <= 9; i++).

If you are calling range with a start of 1 frequently, you might want to define your own function:

>>> def range1(start, end):
... return range(start, end+1)
...
>>> range1(1, 10)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Why is looping over range() in Python faster than using a while loop?

see the disassembly of python byte code, you may get a more concrete idea

use while loop:

1           0 LOAD_CONST               0 (0)
3 STORE_NAME 0 (i)

2 6 SETUP_LOOP 28 (to 37)
>> 9 LOAD_NAME 0 (i) # <-
12 LOAD_CONST 1 (100000000) # <-
15 COMPARE_OP 0 (<) # <-
18 JUMP_IF_FALSE 14 (to 35) # <-
21 POP_TOP # <-

3 22 LOAD_NAME 0 (i) # <-
25 LOAD_CONST 2 (1) # <-
28 INPLACE_ADD # <-
29 STORE_NAME 0 (i) # <-
32 JUMP_ABSOLUTE 9 # <-
>> 35 POP_TOP
36 POP_BLOCK

The loop body has 10 op

use range:

1           0 SETUP_LOOP              23 (to 26)
3 LOAD_NAME 0 (range)
6 LOAD_CONST 0 (0)
9 LOAD_CONST 1 (100000000)
12 CALL_FUNCTION 2
15 GET_ITER
>> 16 FOR_ITER 6 (to 25) # <-
19 STORE_NAME 1 (n) # <-

2 22 JUMP_ABSOLUTE 16 # <-
>> 25 POP_BLOCK
>> 26 LOAD_CONST 2 (None)
29 RETURN_VALUE

The loop body has 3 op

The time to run C code is much shorter than intepretor and can be ignored.

Check if something is (not) in a list in Python

The bug is probably somewhere else in your code, because it should work fine:

>>> 3 not in [2, 3, 4]
False
>>> 3 not in [4, 5, 6]
True

Or with tuples:

>>> (2, 3) not in [(2, 3), (5, 6), (9, 1)]
False
>>> (2, 3) not in [(2, 7), (7, 3), "hi"]
True


Related Topics



Leave a reply



Submit