Why Is the Id of a Python Class Not Unique When Called Quickly

Why is the id of a Python class not unique when called quickly?

The id of an object is only guaranteed to be unique during that object's lifetime, not over the entire lifetime of a program. The two someClass objects you create only exist for the duration of the call to print - after that, they are available for garbage collection (and, in CPython, deallocated immediately). Since their lifetimes don't overlap, it is valid for them to share an id.

It is also unsuprising in this case, because of a combination of two CPython implementation details: first, it does garbage collection by reference counting (with some extra magic to avoid problems with circular references), and second, the id of an object is related to the value of the underlying pointer for the variable (ie, its memory location). So, the first object, which was the most recent object allocated, is immediately freed - it isn't too surprising that the next object allocated will end up in the same spot (although this potentially also depends on details of how the interpreter was compiled).

If you are relying on several objects having distinct ids, you might keep them around - say, in a list - so that their lifetimes overlap. Otherwise, you might implement a class-specific id that has different guarantees - eg:

class SomeClass:
next_id = 0

def __init__(self):
self.id = SomeClass.nextid
SomeClass.nextid += 1

How unique is Python's id()?

Yes, CPython re-uses id() values. Do not count on these being unique in a Python program.

This is clearly documented:

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

Bold emphasis mine. The id is unique only as long as an object is alive. Objects that have no references left to them are removed from memory, allowing the id() value to be re-used for another object, hence the non-overlapping lifetimes wording.

Note that this applies to CPython only, the standard implementation provided by python.org. There are other Python implementations, such as IronPython, Jython and PyPy, that make their own choices about how to implement id(), because they each can make distinct choices on how to handle memory and object lifetimes.

To address your specific questions:

  1. In CPython, id() is the memory address. New objects will be slotted into the next available memory space, so if a specific memory address has enough space to hold the next new object, the memory address will be reused. You can see this in the interpreter when creating new objects that are the same size:

    >>> id(1234)
    4546982768
    >>> id(4321)
    4546982768

    The 1234 literal creates a new integer object, for which id() produces a numeric value. As there are no further references to the int value, it is removed from memory again. But executing the same expression again with a different integer literal, and chances are you'll see the same id() value (a garbage collection run breaking cyclic references could free up more memory, so you could also not see the same id() again.

    So it's not random, but in CPython it is a function of the memory allocation algorithms.

  2. If you need to check specific objects, keep your own reference to it. That can be a weakref weak reference if all you need to assure is that the object is still 'alive'.

    For example, recording an object reference first, then later checking it:

    import weakref

    # record
    object_ref = weakref.ref(some_object)

    # check if it's the same object still
    some_other_reference is object_ref() # only true if they are the same object

    The weak reference won't keep the object alive, but if it is alive then the object_ref() will return it (it'll return None otherwise).

    You could use such a mechanism to generate really unique identifiers, see below.

  3. All you have to do to 'destroy' an object is to remove all references to it. Variables (local and global) are references. So are attributes on other objects, and entries in containers such as lists, tuples, dictionaries, sets, etc.

    The moment all references to an object are gone, the reference count on the object drops to 0 and it is deleted, there and then.

    Garbage collection only is needed to break cyclic references, objects that reference one another only, with no further references to the cycle. Because such a cycle will never reach a reference count of 0 without help, the garbage collector periodically checks for such cycles and breaks one of the references to help clear those objects from memory.

    So you can cause any object to be deleted from memory (freed), by removing all references to it. How you achieve that depends on how the object is referenced. You can ask the interpreter to tell you what objects are referencing a given object with the gc.get_referrers() function, but take into account that doesn't give you variable names. It gives you objects, such as the dictionary object that is the __dict__ attribute of a module that references the object as a global, etc. For code fully under your control, at most use gc.get_referrers() as a tool to remind yourself what places the object is referenced from as you write the code to remove those.

If you must have unique identifiers for the lifetime of the Python application, you'd have to implement your own facility. If your objects are hashable and support weak references, then you could just use a WeakKeyDictionary instance to associate arbitrary objects with UUIDs:

from weakref import WeakKeyDictionary
from collections import defaultdict
from uuid import uuid4

class UniqueIdMap(WeakKeyDictionary):
def __init__(self, dict=None):
super().__init__(self)
# replace data with a defaultdict to generate uuids
self.data = defaultdict(uuid4)
if dict is not None:
self.update(dict)

uniqueidmap = UniqueIdMap()

def uniqueid(obj):
"""Produce a unique integer id for the object.

Object must me *hashable*. Id is a UUID and should be unique
across Python invocations.

"""
return uniqueidmap[obj].int

This still produces integers, but as they are UUIDs they are not quite guaranteed to be unique, but the likelihood you'll ever encounter the same ID during your lifetime are smaller than being hit by a meteorite. See How unique is UUID?

This then gives you unique ids even for objects with non-overlapping lifetimes:

>>> class Foo:
... pass
...
>>> id(Foo())
4547149104
>>> id(Foo()) # memory address reused
4547149104
>>> uniqueid(Foo())
151797163173960170410969562162860139237
>>> uniqueid(Foo()) # but you still get a unique UUID
188632072566395632221804340107821543671

understanding python id() uniqueness

Ids are guaranteed to be unique for the lifetime of the object. If an object gets deleted, a new object can acquire the same id. CPython will delete items immediately when their refcount drops to zero. The garbage collector is only needed to break up reference cycles.

CPython may also cache and re-use certain immutable objects like small integers and strings defined by literals that are valid identifiers. This is an implementation detail that you should not rely upon. It is generally considered improper to use is checks on such objects.

There are certain exceptions to this rule, for example, using an is check on possibly-interned strings as an optimization before comparing them with the normal == operator is fine. The dict builtin uses this strategy for lookups to make them faster for identifiers.

a is b or a == b  # This is OK

If the string happens to be interned, then the above can return true with a simple id comparison instead of a slower character-by-character comparison, but it still returns true if and only if a == b (because if a is b then a == b must also be true). However, a good implementation of .__eq__() would already do an is check internally, so at best you would only avoid the overhead of calling the .__eq__().


Thanks for the answer, would you elaborate around the uniqueness for user-defined objects, are they always unique?

The id of any object (be it user-defined or not) is unique for the lifetime of the object. It's important to distinguish objects from variables. It's possible to have two or more variables refer to the same object.

>>> a = object()
>>> b = a
>>> c = object()
>>> a is b
True
>>> a is c
False

Caching optimizations mean that you are not always guaranteed to get a new object in cases where one might naiively think one should, but this does not in any way violate the uniqueness guarantee of IDs. Builtin types like int and str may have some caching optimizations, but they follow exactly the same rules: If they are live at the same time, and their IDs are the same, then they are the same object.

Caching is not unique to builtin types. You can implement caching for your own objects.

>>> def the_one(it=object()):
... return it
...
>>> the_one() is the_one()
True

Even user-defined classes can cache instances. For example, this class only makes one instance of itself.

>>> class TheOne:
... _the_one = None
... def __new__(cls):
... if not cls._the_one:
... cls._the_one = super().__new__(cls)
... return cls._the_one
...
>>> TheOne() is TheOne() # There can be only one TheOne.
True
>>> id(TheOne()) == id(TheOne()) # This is what an is-check does.
True

Note that each construction expression evaluates to an object with the same id as the other. But this id is unique to the object. Both expressions reference the same object, so of course they have the same id.

The above class only keeps one instance, but you could also cache some other number. Perhaps recently used instances, or those configured in a way you expect to be common (as ints do), etc.

Confused about Python’s id()

In general, as soon as you use an integer or a string or any other literal, Python creates a new object in memory for you. It is guaranteed to have the same id for the lifetime of the object, that is, while its reference count is not zero.

When you write something like:

>>> id(1000)
140497411829680

Python creates the integer 1000 and returns its id (the memory address of the object in CPython). After this is done, the reference count of the integer object 1000 is zero and it is deleted. This ensures that you cannot keep filling memory just by writing id(something) (and not binding any variable name to the object).

Typically, you cannot predict when reuse will happen, but in my Python shell it happens quite consistently:

>>> id(1000)
140697307078576
>>> id(1001)
140697307078576
>>> id(1002)
140697307078576
>>> id(1003)
140697307078576

You can see that the same memory address get used again and again when each new integer is created. However, if you prevent the reference count from dropping to zero, you can see that new memory is used instead:

>>> a = 1000
>>> id(a)
140697307078576
>>> b = 1001
>>> id(b)
140697306008368

In CPython, the integers -5 through to 255 are special cases in that they always exist (and so always have the same id during a Python runtime). This is an optimisation to avoid repeated creation and destruction of commonly-used integers.

id() of numpy.float64 objects are the same, even if their values differ?

This looks like a quirk of memory reuse rather than a NumPy bug.

The line

id(numpy.float64(100)) == id(numpy.float64(10))

first creates a float numpy.float64(100) and then calls the id function on it. This memory is then immediately freed by Python's garbage collector because there are no more references to it. The memory slot is free to be reused by any new objects that are created.

When numpy.float64(10) is created, it occupies the same memory location, hence the memory addresses returned by id compare equal.


This chain of events is perhaps clearer when you look at the bytecode:

>>> dis.dis('id(numpy.float64(100)) ==  id(numpy.float64(10))')
0 LOAD_NAME 0 (id)
3 LOAD_NAME 1 (numpy)
6 LOAD_ATTR 2 (float64)
9 LOAD_CONST 0 (100)
12 CALL_FUNCTION 1 (1 positional, 0 keyword pair) # call numpy.float64(100)
15 CALL_FUNCTION 1 (1 positional, 0 keyword pair) # get id of object

# gc runs and frees memory occupied by numpy.float64(100)

18 LOAD_NAME 0 (id)
21 LOAD_NAME 1 (numpy)
24 LOAD_ATTR 2 (float64)
27 LOAD_CONST 1 (10)
30 CALL_FUNCTION 1 (1 positional, 0 keyword pair) # call numpy.float64(10)
33 CALL_FUNCTION 1 (1 positional, 0 keyword pair) # get id of object

36 COMPARE_OP 2 (==) # compare the two ids
39 RETURN_VALUE

Unnamed Python objects have the same id

From the doc of id(object):

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

Since the two ranges inside the id() calls have non-overlapping lifetimes, their id values may be the same.

The two ranges assigned to variables have overlapping lifetimes so they must have different id values.

Edit:

A look into the C sources shows us builtin_id:

builtin_id(PyObject *self, PyObject *v)
{
return PyLong_FromVoidPtr(v);
}

and for PyLong_FromVoidPtr.

PyLong_FromVoidPtr(void *p)
{
#if SIZEOF_VOID_P <= SIZEOF_LONG
return PyLong_FromUnsignedLong((unsigned long)(Py_uintptr_t)p);
#else

#ifndef HAVE_LONG_LONG
# error "PyLong_FromVoidPtr: sizeof(void*) > sizeof(long), but no long long"
#endif
#if SIZEOF_LONG_LONG < SIZEOF_VOID_P
# error "PyLong_FromVoidPtr: sizeof(PY_LONG_LONG) < sizeof(void*)"
#endif
return PyLong_FromUnsignedLongLong((unsigned PY_LONG_LONG)(Py_uintptr_t)p);
#endif /* SIZEOF_VOID_P <= SIZEOF_LONG */

}

So the ID is a memory address.



Related Topics



Leave a reply



Submit