What's With the Integer Cache Maintained by the Interpreter

What's with the integer cache maintained by the interpreter?

Python caches integers in the range [-5, 256], so integers in that range are usually but not always identical.

What you see for 257 is the Python compiler optimizing identical literals when compiled in the same code object.

When typing in the Python shell each line is a completely different statement, parsed and compiled separately, thus:

>>> a = 257
>>> b = 257
>>> a is b
False

But if you put the same code into a file:

$ echo 'a = 257
> b = 257
> print a is b' > testing.py
$ python testing.py
True

This happens whenever the compiler has a chance to analyze the literals together, for example when defining a function in the interactive interpreter:

>>> def test():
... a = 257
... b = 257
... print a is b
...
>>> dis.dis(test)
2 0 LOAD_CONST 1 (257)
3 STORE_FAST 0 (a)

3 6 LOAD_CONST 1 (257)
9 STORE_FAST 1 (b)

4 12 LOAD_FAST 0 (a)
15 LOAD_FAST 1 (b)
18 COMPARE_OP 8 (is)
21 PRINT_ITEM
22 PRINT_NEWLINE
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>> test()
True
>>> test.func_code.co_consts
(None, 257)

Note how the compiled code contains a single constant for the 257.

In conclusion, the Python bytecode compiler is not able to perform massive optimizations (like statically typed languages), but it does more than you think. One of these things is to analyze usage of literals and avoid duplicating them.

Note that this does not have to do with the cache, because it works also for floats, which do not have a cache:

>>> a = 5.0
>>> b = 5.0
>>> a is b
False
>>> a = 5.0; b = 5.0
>>> a is b
True

For more complex literals, like tuples, it "doesn't work":

>>> a = (1,2)
>>> b = (1,2)
>>> a is b
False
>>> a = (1,2); b = (1,2)
>>> a is b
False

But the literals inside the tuple are shared:

>>> a = (257, 258)
>>> b = (257, 258)
>>> a[0] is b[0]
False
>>> a[1] is b[1]
False
>>> a = (257, 258); b = (257, 258)
>>> a[0] is b[0]
True
>>> a[1] is b[1]
True

(Note that constant folding and the peephole optimizer can change behaviour even between bugfix versions, so which examples return True or False is basically arbitrary and will change in the future).


Regarding why you see that two PyInt_Object are created, I'd guess that this is done to avoid literal comparison. for example, the number 257 can be expressed by multiple literals:

>>> 257
257
>>> 0x101
257
>>> 0b100000001
257
>>> 0o401
257

The parser has two choices:

  • Convert the literals to some common base before creating the integer, and see if the literals are equivalent. then create a single integer object.
  • Create the integer objects and see if they are equal. If yes, keep only a single value and assign it to all the literals, otherwise, you already have the integers to assign.

Probably the Python parser uses the second approach, which avoids rewriting the conversion code and also it's easier to extend (for example it works with floats as well).


Reading the Python/ast.c file, the function that parses all numbers is parsenumber, which calls PyOS_strtoul to obtain the integer value (for intgers) and eventually calls PyLong_FromString:

    x = (long) PyOS_strtoul((char *)s, (char **)&end, 0);
if (x < 0 && errno == 0) {
return PyLong_FromString((char *)s,
(char **)0,
0);
}

As you can see here the parser does not check whether it already found an integer with the given value and so this explains why you see that two int objects are created,
and this also means that my guess was correct: the parser first creates the constants and only afterward optimizes the bytecode to use the same object for equal constants.

The code that does this check must be somewhere in Python/compile.c or Python/peephole.c, since these are the files that transform the AST into bytecode.

In particular, the compiler_add_o function seems the one that does it. There is this comment in compiler_lambda:

/* Make None the first constant, so the lambda can't have a
docstring. */
if (compiler_add_o(c, c->u->u_consts, Py_None) < 0)
return 0;

So it seems like compiler_add_o is used to insert constants for functions/lambdas etc.
The compiler_add_o function stores the constants into a dict object, and from this immediately follows that equal constants will fall in the same slot, resulting in a single constant in the final bytecode.

What's with the integer cache maintained by the interpreter?

Python caches integers in the range [-5, 256], so integers in that range are usually but not always identical.

What you see for 257 is the Python compiler optimizing identical literals when compiled in the same code object.

When typing in the Python shell each line is a completely different statement, parsed and compiled separately, thus:

>>> a = 257
>>> b = 257
>>> a is b
False

But if you put the same code into a file:

$ echo 'a = 257
> b = 257
> print a is b' > testing.py
$ python testing.py
True

This happens whenever the compiler has a chance to analyze the literals together, for example when defining a function in the interactive interpreter:

>>> def test():
... a = 257
... b = 257
... print a is b
...
>>> dis.dis(test)
2 0 LOAD_CONST 1 (257)
3 STORE_FAST 0 (a)

3 6 LOAD_CONST 1 (257)
9 STORE_FAST 1 (b)

4 12 LOAD_FAST 0 (a)
15 LOAD_FAST 1 (b)
18 COMPARE_OP 8 (is)
21 PRINT_ITEM
22 PRINT_NEWLINE
23 LOAD_CONST 0 (None)
26 RETURN_VALUE
>>> test()
True
>>> test.func_code.co_consts
(None, 257)

Note how the compiled code contains a single constant for the 257.

In conclusion, the Python bytecode compiler is not able to perform massive optimizations (like statically typed languages), but it does more than you think. One of these things is to analyze usage of literals and avoid duplicating them.

Note that this does not have to do with the cache, because it works also for floats, which do not have a cache:

>>> a = 5.0
>>> b = 5.0
>>> a is b
False
>>> a = 5.0; b = 5.0
>>> a is b
True

For more complex literals, like tuples, it "doesn't work":

>>> a = (1,2)
>>> b = (1,2)
>>> a is b
False
>>> a = (1,2); b = (1,2)
>>> a is b
False

But the literals inside the tuple are shared:

>>> a = (257, 258)
>>> b = (257, 258)
>>> a[0] is b[0]
False
>>> a[1] is b[1]
False
>>> a = (257, 258); b = (257, 258)
>>> a[0] is b[0]
True
>>> a[1] is b[1]
True

(Note that constant folding and the peephole optimizer can change behaviour even between bugfix versions, so which examples return True or False is basically arbitrary and will change in the future).


Regarding why you see that two PyInt_Object are created, I'd guess that this is done to avoid literal comparison. for example, the number 257 can be expressed by multiple literals:

>>> 257
257
>>> 0x101
257
>>> 0b100000001
257
>>> 0o401
257

The parser has two choices:

  • Convert the literals to some common base before creating the integer, and see if the literals are equivalent. then create a single integer object.
  • Create the integer objects and see if they are equal. If yes, keep only a single value and assign it to all the literals, otherwise, you already have the integers to assign.

Probably the Python parser uses the second approach, which avoids rewriting the conversion code and also it's easier to extend (for example it works with floats as well).


Reading the Python/ast.c file, the function that parses all numbers is parsenumber, which calls PyOS_strtoul to obtain the integer value (for intgers) and eventually calls PyLong_FromString:

    x = (long) PyOS_strtoul((char *)s, (char **)&end, 0);
if (x < 0 && errno == 0) {
return PyLong_FromString((char *)s,
(char **)0,
0);
}

As you can see here the parser does not check whether it already found an integer with the given value and so this explains why you see that two int objects are created,
and this also means that my guess was correct: the parser first creates the constants and only afterward optimizes the bytecode to use the same object for equal constants.

The code that does this check must be somewhere in Python/compile.c or Python/peephole.c, since these are the files that transform the AST into bytecode.

In particular, the compiler_add_o function seems the one that does it. There is this comment in compiler_lambda:

/* Make None the first constant, so the lambda can't have a
docstring. */
if (compiler_add_o(c, c->u->u_consts, Py_None) < 0)
return 0;

So it seems like compiler_add_o is used to insert constants for functions/lambdas etc.
The compiler_add_o function stores the constants into a dict object, and from this immediately follows that equal constants will fall in the same slot, resulting in a single constant in the final bytecode.

Unexpected id method behaviour

Because, while it is true that

a is a

id(a) is a large integer and does not compare by is

>>> a = 10
>>> print(id(a) is id(a))
False
>>> print(id(a),id(a))
(44337068, 44337068)
>>>

Check https://docs.python.org/2/c-api/int.html#c.PyInt_FromLong for which integers compare by is - but remember that is an implementation detail so don't rely on it (always compare ints with ==):

>>> a = 256
>>> b = 256
>>> a is b
True
>>> a = 257
>>> b = 257
>>> a is b
False

(from "is" operator behaves unexpectedly with integers)

For further motivation to always use ==, the following:

a = 300
b = 300
print(a is b)
print(a is 300)
def c(a): return a is 300
def d(a): return a is b
print(c(a))
print(d(a))

when saved to a file and run, prints True, True, False, and True...

Computer is stop working after I am running my code in python

You're falling foul of the integer cache in Python. is not is not the way to test for equality of integers. However, it will work for values between -5 and 256 because Python maintains a cache of objects for these values.

Change:
while len(el) is not 784:

To:
while len(el) < 784:



Related Topics



Leave a reply



Submit