How Do Chained Comparisons in Python Actually Work

How do chained comparisons in Python actually work?

You can simply let Python tell you what bytecode is produced with the dis module:

>>> import dis
>>> def f(): return 1 < input("Value:") < 10
...
>>> dis.dis(f)
1 0 LOAD_CONST 1 (1)
3 LOAD_GLOBAL 0 (input)
6 LOAD_CONST 2 ('Value:')
9 CALL_FUNCTION 1
12 DUP_TOP
13 ROT_THREE
14 COMPARE_OP 0 (<)
17 JUMP_IF_FALSE_OR_POP 27
20 LOAD_CONST 3 (10)
23 COMPARE_OP 0 (<)
26 RETURN_VALUE
>> 27 ROT_TWO
28 POP_TOP
29 RETURN_VALUE

Python uses a stack; the CALL_FUNCTION bytecode uses items on the stack (the input global and the 'Value:' string) to call a function with one argument, to replace those two items on the stack with the result of the function call. Before the function call, the the constant 1 was loaded on the stack.

So by the time input was called the stack looks like:

input_result
1

and DUP_TOP duplicates the top value, before rotating the top three stack values to arrive at:

1
input_result
input_result

and a COMPARE_OP to test the top two items with <, replacing the top two items with the result.

If that result was False the JUMP_IF_FALSE_OR_POP bytecode jumps over to 27, which rotates the False on top with the remaining input_result to clear that out with a POP_TOP, to then return the remaining False top value as the result.

If the result True however, that value is popped of the stack by the JUMP_IF_FALSE_OR_POP bytecode and in it's place the 10 value is loaded on top and we get:

10    
input_result

and another comparison is made and returned instead.

In summary, essentially Python then does this:

stack_1 = stack_2 = input('Value:')
if 1 < stack_1:
result = False
else:
result = stack_2 < 10

with the stack_* values cleared again.

The stack, then, holds the unnamed intermediate result to compare

What does it mean that Python comparison operators chain/group left to right?

Grouping (this is what non-comparison operators do):

a + b + c   means   (a + b) + c

Chaining (this is what comparison operators do):

a < b < c   means   (a < b) and (b < c)

Grouping left to right (this is the way things are grouped):

5 - 2 - 1   means   (5 - 2) - 1 == 2

as opposed to grouping right to left (this would produce a different result):

5 - (2 - 1) == 4

Chaining left to right

Chaining is left to right, so in a < b < c, the expression a < b is evaluated before b < c, and if a < b is falsey, b < c is not evaluated.

(2 < 1 < f()) gives the value False without calling the function f, because 2 < 1 evaluates to false, so the second comparison does not need to be performed.

f() > 1 > g() calls f() in order to evaluate the first comparison, and depending on the result, it might or might not need to evaluate the second condition, which requires calling g().

NB. Each operand is evaluated at most once. So in the expression 1 < f() < 2, the function f() is only called once, and the value it gives is used in both comparisons (if necessary).

https://en.wikipedia.org/wiki/Short-circuit_evaluation

Precedence of chained comparisons?

Python has an Abstract Syntax Tree module to show you what's happening:

import ast
t = ast.parse('not a < b < c')
print(ast.dump(t))

It gives (cleaned up a bit):

[Expr(value=UnaryOp(
op=Not(),
operand=Compare(
left=Name(id='a'),
ops=[Lt(), Lt()],
comparators=[Name(id='b'), Name(id='c')]
)
))]

And indeed, the documentation says that not has lower precedence than <.

Why are chained operator expressions slower than their expanded equivalent?

In CPython's stack-based bytecode execution engine, saving an extra reference to b for the chained comparison isn't free. It's at the "seriously, don't worry about it" level of cheap, but it's not literally free, and you're comparing it to the slightly cheaper operation of loading a local variable.

The COMPARE_OP opcode removes the objects it's comparing from the stack, so for the chained comparison, Python has to create another reference to b (DUP_TOP) and shove it two places down in the stack (ROT_THREE) to get it out of the way.

In a <= b and b <= c, instead of the above reference shuffling, Python just copies another reference to b out of the stack frame's fastlocals array. This involves less pointer shuffling and one less trip around the bytecode evaluation loop, so it's slightly cheaper.

Can you simplify chained comparisons with not equal or equal in Python?

this is functionnaly equivalent, but when this:

10 < x < 40

is nice to read, mixing different operators to use chained comparisons isn't the best choice.

Is that really the same? let's disassemble to find out:

def f1(x,y):
if y > x and x != -1:
return 0

def f2(x,y):
if y > x != -1:
return 0

import dis

print("function 1")
dis.dis(f1)
print("function 2")
dis.dis(f2)

result:

function 1
2 0 LOAD_FAST 1 (y)
3 LOAD_FAST 0 (x)
6 COMPARE_OP 4 (>)
9 POP_JUMP_IF_FALSE 28
12 LOAD_FAST 0 (x)
15 LOAD_CONST 3 (-1)
18 COMPARE_OP 3 (!=)
21 POP_JUMP_IF_FALSE 28

3 24 LOAD_CONST 2 (0)
27 RETURN_VALUE
>> 28 LOAD_CONST 0 (None)
31 RETURN_VALUE
function 2
6 0 LOAD_FAST 1 (y)
3 LOAD_FAST 0 (x)
6 DUP_TOP
7 ROT_THREE
8 COMPARE_OP 4 (>)
11 JUMP_IF_FALSE_OR_POP 23
14 LOAD_CONST 3 (-1)
17 COMPARE_OP 3 (!=)
20 JUMP_FORWARD 2 (to 25)
>> 23 ROT_TWO
24 POP_TOP
>> 25 POP_JUMP_IF_FALSE 32

7 28 LOAD_CONST 2 (0)
31 RETURN_VALUE
>> 32 LOAD_CONST 0 (None)
35 RETURN_VALUE
>>>

Surprisingly they aren't the same, and the chained version has more instructions.

Not sure what's going on here (some took some more time to explain it better: How do chained comparisons in Python actually work?), but really I'd stick to the and version which shortcuts and is so much readable (think of future maintainers too...).

That said, one interesting thing about chained comparisons would be if the central argument is computed/takes a long time to compute/has a side effect in the computation and you don't want to store it in a variable:

if y > super_long_computation(x) != -1:

In that case, the central argument is only evaluated once. In the case of and, you'd have to store it beforehand.

Simplify Chained Comparison

In Python you can "chain" comparison operations which just means they are "and"ed together. In your case, it'd be like this:

if start <= x <= end:

Reference: https://docs.python.org/3/reference/expressions.html#comparisons

Custom chained comparisons

Python allows expressions like x > y > z, which, according to the docs, is equivalent to (x > y) and (y > z) except y is only evaluated once.

According to this, low > high > low will be equivalent to (low > high) and (high > low).

>>> x = low > high   # CompareList([False])
>>> y = high > low # CompareList([True])
>>> x and y
CompareList([True])

More from the documentation on x and y:

x and y: if x is false, then x, else y

In the above case:

>>> x is False
False
>>> x if x is False else y # x and y
CompareList([True])

so when you do x and y it returns the y which is CompareList([True]).

How does three operands comparison work in Python under the hood?

The comparison grammar is not that interesting here, it simply lets you append multiple comparators to an operator:

comparison    ::=  or_expr ( comp_operator or_expr )*
comp_operator ::= "<" | ">" | "==" | ">=" | "<=" | "!="
| "is" ["not"] | ["not"] "in"

So lets ask the Python parser directly, by using the ast module (which simply asks the Python compiler itself to only return the Abstract Syntax Tree):

>>> import ast
>>> ast.dump(ast.parse('a > b > c', mode='eval'))
"Expression(body=Compare(left=Name(id='a', ctx=Load()), ops=[Gt(), Gt()], comparators=[Name(id='b', ctx=Load()), Name(id='c', ctx=Load())]))"

So there is just a single Compare node, with multiple operators and comparators:

Compare(
left=Name(id='a'),
ops=[Gt(), Gt()],
comparators=[Name(id='b'), Name(id='c')])

(I omitted the Expression and ctx parts).

This lets the interpreter evaluate comparators as needed (e.g. if a < b is false, the remaining comparators don't have to be considered).

The resulting bytecode uses conditional jumps to skip remaining comparisons:

>>> import dis
>>> dis.dis(compile('a > b > c', '', 'eval'))
1 0 LOAD_NAME 0 (a)
2 LOAD_NAME 1 (b)
4 DUP_TOP
6 ROT_THREE
8 COMPARE_OP 4 (>)
10 JUMP_IF_FALSE_OR_POP 18
12 LOAD_NAME 2 (c)
14 COMPARE_OP 4 (>)
16 RETURN_VALUE
>> 18 ROT_TWO
20 POP_TOP
22 RETURN_VALUE

Why 5 = -15 == 5 = 1 != 20 is False?

Python's operator precedence is documented here. Contrary to what you claim, all of those comparison operators have the same precedence.

Normally, one looks to associativity to handle ties in precedence. But the comparison operators are special. Python supports chaining. This means that

x relop1 y relop2 z

is short for

x relop1 y and y relop2 z

(Except y is only evaluated once.)

So,

    5 <= -15 == 5 >= 1 != 20
= ( 5 <= -15 ) and ( -15 == 5 ) and ( 5 >= 1 ) and ( 1 != 20 )
= False and False and True and True
= False


Related Topics



Leave a reply



Submit