Why Does the 'Is' Operator Behave Differently in a Script VS the Repl

Confused about `is` operator with strings

I believe it has to do with string interning. In essence, the idea is to store only a single copy of each distinct string, to increase performance on some operations.

Basically, the reason why a is b works is because (as you may have guessed) there is a single immutable string that is referenced by Python in both cases. When a string is large (and some other factors that I don't understand, most likely), this isn't done, which is why your second example returns False.

EDIT: And in fact, the odd behavior seems to be a side-effect of the interactive environment. If you take your same code and place it into a Python script, both a is b and ktr is ptr return True.

a="poi"
b="poi"
print a is b # Prints 'True'

ktr = "today is a fine day"
ptr = "today is a fine day"
print ktr is ptr # Prints 'True'

This makes sense, since it'd be easy for Python to parse a source file and look for duplicate string literals within it. If you create the strings dynamically, then it behaves differently even in a script.

a="p" + "oi"
b="po" + "i"
print a is b # Oddly enough, prints 'True'

ktr = "today is" + " a fine day"
ptr = "today is a f" + "ine day"
print ktr is ptr # Prints 'False'

As for why a is b still results in True, perhaps the allocated string is small enough to warrant a quick search through the interned collection, whereas the other one is not?

Different behavior in Python script and Python IDLE?

When Python executes a script file, the whole file is parsed first. You can notice that when you introduce a syntax error somewhere: Regardless of where it is, it will prevent any line from executing.

So since Python parses the file first, literals can be loaded effectively into the memory. Since Python knows that these are constant, all variables that represent those constant values can point to the same object in memory. So the object is shared.

This works for ints and floats, but even for strings; even when there is a constant expression that needs to be evaluated first:

a = "foo"
b = "foo"
c = "fo" + "o"
print(a is b)
print(a is c)

Now in IDLE, the behavior is very different: As an interactive interpreter, IDLE executes every line separately. So a = 1.1 and b = 1.1 are executed in separated contexts which makes it impossible (or just very hard) to figure out that they both share the same constant literal value and could share the memory. So instead, the interpreter will allocate two different objects, which causes the identity check using is to fail.

For small integers, the situation is a bit different. Because they are often used, CPython stores a set of integers (in the range between -5 and 256) statically and makes that every value of these points to the same int object. That’s why you get a different result for small integers than for any other object. See also the following questions:

  • "is" operator behaves unexpectedly with integers
  • What's with the Integer Cache inside Python?

What is the difference between destructured assignment and normal assignment?

I know nothing about Python but I was curious.

First, this happens when assigning an array too:

x = [-10,-10]
x[0] is x[1] # True

It also happens with strings, which are immutable.

x = ['foo', 'foo']
x[0] is x[1] # True

Disassembly of the first function:

         0 LOAD_CONST               1 (-10)
3 LOAD_CONST 1 (-10)
6 BUILD_LIST 2
9 STORE_FAST 0 (x)

The LOAD_CONST (consti) op pushes constant co_consts[consti] onto the stack. But both ops here have consti=1, so the same object is being pushed to the stack twice. If the numbers in the array were different, it would disassemble to this:

         0 LOAD_CONST               1 (-10)
3 LOAD_CONST 2 (-20)
6 BUILD_LIST 2
9 STORE_FAST 0 (x)

Here, constants of index 1 and 2 are pushed.

co_consts is a tuple of constants used by a Python script. Evidently literals with the same value are only stored once.

As for why 'normal' assignment works - you're using the REPL so I assume each line is compiled seperately. If you put

x = -10
y = -10
print(x is y)

into a test script, you'll get True. So normal assignment and destructured assignment both work the same in this regard :)

Python 3.6.5 is and == for integers beyond caching interval

CPython detects constant values in your code and re-uses them to save memory. These constants are stored on code objects, and can even be accessed from within python:

>>> codeobj = compile('999 is 999', '<stdin>', 'exec')
>>> codeobj
<code object <module> at 0x7fec489ef420, file "<stdin>", line 1>
>>> codeobj.co_consts
(999, None)

Both operands of your is refer to this very same 999 integer. We can confirm this by dissecting the code with the dis module:

>>> dis.dis(codeobj)
1 0 LOAD_CONST 0 (999)
2 LOAD_CONST 0 (999)
4 COMPARE_OP 8 (is)
6 POP_TOP
8 LOAD_CONST 1 (None)
10 RETURN_VALUE

As you can see, the first two LOAD_CONST instructions both load the constant with index 0, which is the 999 number.

However, this only happens if the two numbers are compiled at the same time. If you create each number in a separate code object, they will no longer be identical:

>>> x = 999
>>> x is 999
False

Different results in IDLE and python shell using 'is'

You should not rely on is for comparison of values when you want to test equality.

The is keyword compares id's of the variables, and checks if they are the same object. This will only work for the range of integers [-5,256] in Python, as these are singletons (these values are cached and referred to, instead of having a value stored in memory). See What's with the integer cache maintained by the interpreter? This is not the same as checking if they are the same value.

As for why it behaves differently in a REPL environment versus a passed script, see Different behavior in python script and python idle?. The jist of it is that a passed script parses the entire file first, while a REPL environment like ipython or an IDLE shell reads lines one at a time. a=10.24 and b=10.24 are executed in different contexts, so the shell doesn't know that they should be the same value.

Why should num3 is num4 result in False?

is will return True if two variables point to the same object

== if the objects referred to by the variables are equal.

Run in script:

is returns True

Because both num3 and num4 point to the same object:

# id() to get the unique identifier of an object    
print(id(num3) , id(num4))

55080624 55080624

== also returns True

num3 = 257
num4 = 257

as both refer to <class 'int'> 257

Run in REPL :

is returns False

Because both num3 and num4 point to the different objects:

# id() to get the unique identifier of an object    
print(id(num3) , id(num4))

34836272 39621264

== returns True

num3 = 257
num4 = 257

as both refer to <class 'int'> 257

The reason you have different result is from Why does the `is` operator behave differently in a script vs the REPL?

When you run code in a .py script, the entire file is compiled into a
code object before executing it. In this case, CPython is able to make
certain optimizations - like reusing the same instance for the integer
300.

So in your case, both num3 and num4 refer to <class 'int'> 257. in REPL you have different object ids, however after the file is compiled and optimized to same object id if you run them in script.

Regards to the different behaviors of 256 and 257 :

"is" operator behaves unexpectedly with integers

What's with the integer cache maintained by the interpreter?

in short, objects representing values from -5 to +256 are created at startup time, so if you have number range -5 to 256 you get the same object id in REPL, for any int <-5 and > 256,they will be assigned to a new object id.

for example :

num5 = -6
num6 = -6

print(id(num5),id(num6))

39621232 39621136

num7 = 258
num8 = 258

print(id(num7),id(num8))

39621296 39621328


Related Topics



Leave a reply



Submit