Confused about `is` operator with strings
I believe it has to do with string interning. In essence, the idea is to store only a single copy of each distinct string, to increase performance on some operations.
Basically, the reason why a is b
works is because (as you may have guessed) there is a single immutable string that is referenced by Python in both cases. When a string is large (and some other factors that I don't understand, most likely), this isn't done, which is why your second example returns False.
EDIT: And in fact, the odd behavior seems to be a side-effect of the interactive environment. If you take your same code and place it into a Python script, both a is b
and ktr is ptr
return True.
a="poi"
b="poi"
print a is b # Prints 'True'
ktr = "today is a fine day"
ptr = "today is a fine day"
print ktr is ptr # Prints 'True'
This makes sense, since it'd be easy for Python to parse a source file and look for duplicate string literals within it. If you create the strings dynamically, then it behaves differently even in a script.
a="p" + "oi"
b="po" + "i"
print a is b # Oddly enough, prints 'True'
ktr = "today is" + " a fine day"
ptr = "today is a f" + "ine day"
print ktr is ptr # Prints 'False'
As for why a is b
still results in True, perhaps the allocated string is small enough to warrant a quick search through the interned collection, whereas the other one is not?
Different behavior in Python script and Python IDLE?
When Python executes a script file, the whole file is parsed first. You can notice that when you introduce a syntax error somewhere: Regardless of where it is, it will prevent any line from executing.
So since Python parses the file first, literals can be loaded effectively into the memory. Since Python knows that these are constant, all variables that represent those constant values can point to the same object in memory. So the object is shared.
This works for ints and floats, but even for strings; even when there is a constant expression that needs to be evaluated first:
a = "foo"
b = "foo"
c = "fo" + "o"
print(a is b)
print(a is c)
Now in IDLE, the behavior is very different: As an interactive interpreter, IDLE executes every line separately. So a = 1.1
and b = 1.1
are executed in separated contexts which makes it impossible (or just very hard) to figure out that they both share the same constant literal value and could share the memory. So instead, the interpreter will allocate two different objects, which causes the identity check using is
to fail.
For small integers, the situation is a bit different. Because they are often used, CPython stores a set of integers (in the range between -5 and 256) statically and makes that every value of these points to the same int
object. That’s why you get a different result for small integers than for any other object. See also the following questions:
- "is" operator behaves unexpectedly with integers
- What's with the Integer Cache inside Python?
What is the difference between destructured assignment and normal assignment?
I know nothing about Python but I was curious.
First, this happens when assigning an array too:
x = [-10,-10]
x[0] is x[1] # True
It also happens with strings, which are immutable.
x = ['foo', 'foo']
x[0] is x[1] # True
Disassembly of the first function:
0 LOAD_CONST 1 (-10)
3 LOAD_CONST 1 (-10)
6 BUILD_LIST 2
9 STORE_FAST 0 (x)
The LOAD_CONST (consti)
op pushes constant co_consts[consti]
onto the stack. But both ops here have consti=1
, so the same object is being pushed to the stack twice. If the numbers in the array were different, it would disassemble to this:
0 LOAD_CONST 1 (-10)
3 LOAD_CONST 2 (-20)
6 BUILD_LIST 2
9 STORE_FAST 0 (x)
Here, constants of index 1 and 2 are pushed.
co_consts
is a tuple of constants used by a Python script. Evidently literals with the same value are only stored once.
As for why 'normal' assignment works - you're using the REPL so I assume each line is compiled seperately. If you put
x = -10
y = -10
print(x is y)
into a test script, you'll get True
. So normal assignment and destructured assignment both work the same in this regard :)
Python 3.6.5 is and == for integers beyond caching interval
CPython detects constant values in your code and re-uses them to save memory. These constants are stored on code objects, and can even be accessed from within python:
>>> codeobj = compile('999 is 999', '<stdin>', 'exec')
>>> codeobj
<code object <module> at 0x7fec489ef420, file "<stdin>", line 1>
>>> codeobj.co_consts
(999, None)
Both operands of your is
refer to this very same 999 integer. We can confirm this by dissecting the code with the dis
module:
>>> dis.dis(codeobj)
1 0 LOAD_CONST 0 (999)
2 LOAD_CONST 0 (999)
4 COMPARE_OP 8 (is)
6 POP_TOP
8 LOAD_CONST 1 (None)
10 RETURN_VALUE
As you can see, the first two LOAD_CONST
instructions both load the constant with index 0
, which is the 999 number.
However, this only happens if the two numbers are compiled at the same time. If you create each number in a separate code object, they will no longer be identical:
>>> x = 999
>>> x is 999
False
Different results in IDLE and python shell using 'is'
You should not rely on is for comparison of values when you want to test equality.
The is
keyword compares id's of the variables, and checks if they are the same object. This will only work for the range of integers [-5,256] in Python, as these are singletons (these values are cached and referred to, instead of having a value stored in memory). See What's with the integer cache maintained by the interpreter? This is not the same as checking if they are the same value.
As for why it behaves differently in a REPL environment versus a passed script, see Different behavior in python script and python idle?. The jist of it is that a passed script parses the entire file first, while a REPL environment like ipython or an IDLE shell reads lines one at a time. a=10.24
and b=10.24
are executed in different contexts, so the shell doesn't know that they should be the same value.
Why should num3 is num4 result in False?
is will return True if two variables point to the same object
== if the objects referred to by the variables are equal.
Run in script:
is returns True
Because both num3 and num4 point to the same object:
# id() to get the unique identifier of an object
print(id(num3) , id(num4))
55080624 55080624
== also returns True
num3 = 257
num4 = 257
as both refer to <class 'int'> 257
Run in REPL :
is returns False
Because both num3 and num4 point to the different objects:
# id() to get the unique identifier of an object
print(id(num3) , id(num4))
34836272 39621264
== returns True
num3 = 257
num4 = 257
as both refer to <class 'int'> 257
The reason you have different result is from Why does the `is` operator behave differently in a script vs the REPL?
When you run code in a .py script, the entire file is compiled into a
code object before executing it. In this case, CPython is able to make
certain optimizations - like reusing the same instance for the integer
300.
So in your case, both num3 and num4 refer to <class 'int'> 257
. in REPL you have different object ids, however after the file is compiled and optimized to same object id if you run them in script.
Regards to the different behaviors of 256 and 257 :
"is" operator behaves unexpectedly with integers
What's with the integer cache maintained by the interpreter?
in short, objects representing values from -5 to +256 are created at startup time, so if you have number range -5 to 256 you get the same object id in REPL, for any int <-5 and > 256,they will be assigned to a new object id.
for example :
num5 = -6
num6 = -6
print(id(num5),id(num6))
39621232 39621136
num7 = 258
num8 = 258
print(id(num7),id(num8))
39621296 39621328
Related Topics
If Two Variables Point to the Same Object, Why Doesn't Reassigning One Variable Affect the Other
How to Change the Host and Port That the Flask Command Uses
How to Print a Dictionary's Key
What's 0Xff for in Cv2.Waitkey(1)
Possibilities for Python Classes Organized Across Files
How to Get Current Function into a Variable
Embedding Ipython Qt Console in a Pyqt Application
Sparse Matrix Slicing Using List of Int
Integer Overflow in Numpy Arrays
Python Saving Multiple Figures into One PDF File
How to Specify "Nullable" Return Type with Type Hints
Python: Tf-Idf-Cosine: to Find Document Similarity
Get All Object Attributes in Python
Iso to Datetime Object: 'Z' Is a Bad Directive
Python Matplotlib Figure Title Overlaps Axes Label When Using Twiny