What Is the Id( ) Function Used For

What is the id( ) function used for?

Your post asks several questions:

What is the number returned from the function?

It is "an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime." (Python Standard Library - Built-in Functions) A unique number. Nothing more, and nothing less. Think of it as a social-security number or employee id number for Python objects.

Is it the same with memory addresses in C?

Conceptually, yes, in that they are both guaranteed to be unique in their universe during their lifetime. And in one particular implementation of Python, it actually is the memory address of the corresponding C object.

If yes, why doesn't the number increase instantly by the size of the data type (I assume that it would be int)?

Because a list is not an array, and a list element is a reference, not an object.

When do we really use id( ) function?

Hardly ever. You can test if two references are the same by comparing their ids, but the is operator has always been the recommended way of doing that. id( ) is only really useful in debugging situations.

What do people use the identity function for?

Remember that in Haskell functions are first class values, and can be used as data the same way as other values, and passed as arguments to other functions. Often you build the functions you really want to use by applying other functions to each other. Occasionally you will find that the function you want to use in a spot happens to be nothing more complicated than id.

For example, here is a function that negates every second element of a list:

negateEverySecond = zipWith id (cycle [id, negate])

python id() function implementation

There are multiple implementations of python. In cpython, all objects have a standard header and the id is the memory address of that header. References to objects are C pointers to their object header (that same memory address that is the id). You can't use a dunder method to find an object because you need the object pointer to find the dunder methods.

Python is compiled into byte code and that byte code is executed by C. When you call a function like id, that function can be more byte code, but it can also be a C function. Search for "builtin_id" in bltinmodule.c and you'll see the C implementation of id(some_object).

static PyObject *
builtin_id(PyModuleDef *self, PyObject *v)
/*[clinic end generated code: output=0aa640785f697f65 input=5a534136419631f4]*/
{
PyObject *id = PyLong_FromVoidPtr(v);

if (id && PySys_Audit("builtins.id", "O", id) < 0) {
Py_DECREF(id);
return NULL;
}

return id;
}

The id function is called with PyObject *v, a pointer to the object whose id should be taken. PyObject is the standard object header used by all python objects. It includes information needed to figure out what type the object really is. The id function turns the object pointer into a python integer with PyLong_FromVoidPtr (the name "long" for a python int is somewhat historical). That's the id you see at the python level.

You can get the cpython source on github and you can read up on C in the python docs at Extending and Embedding the Python Interpreter and Python/C API Reference Manual

Uses for Haskell id function

It's useful as an argument to higher order functions (functions which take functions as arguments), where you want some particular value left unchanged.

Example 1: Leave a value alone if it is in a Just, otherwise, return a default of 7.

Prelude Data.Maybe> :t maybe
maybe :: b -> (a -> b) -> Maybe a -> b

Prelude Data.Maybe> maybe 7 id (Just 2)
2

Example 2: building up a function via a fold:

Prelude Data.Maybe> :t foldr (.) id [(+2), (*7)]
:: (Num a) => a -> a

Prelude Data.Maybe> let f = foldr (.) id [(+2), (*7)]

Prelude Data.Maybe> f 7
51

We built a new function f by folding a list of functions together with (.), using id as the base case.

Example 3: the base case for functions as monoids (simplified).

instance Monoid (a -> a) where
mempty = id
f `mappend` g = (f . g)

Similar to our example with fold, functions can be treated as concatenable values, with id serving for the empty case, and (.) as append.

Example 4: a trivial hash function.

Data.HashTable> h <- new (==) id :: IO (HashTable Data.Int.Int32 Int)

Data.HashTable> insert h 7 2

Data.HashTable> Data.HashTable.lookup h 7
Just 2

Hashtables require a hashing function. But what if your key is already hashed? Then pass the id function, to fill in as your hashing method, with zero performance overhead.

What's the purpose of `id` function in the FSharp.Core?

When working with higher-order functions (i.e. functions that return other functions and/or take other functions as parameters), you always have to provide something as parameter, but there isn't always an actual data transformation that you'd want to apply.

For example, the function Seq.collect flattens a sequence of sequences, and takes a function that returns the "nested" sequence for each element of the "outer" sequence. For example, this is how you might get the list of all grandchildren of a UI control of some sort:

let control = ...
let allGrandChildren = control.Children |> Seq.collect (fun c -> c.Children)

But a lot of times, each element of the sequence will already be a sequence by itself - for example, you may have a list of lists:

let l = [ [1;2]; [3;4]; [5;6] ]

In this case, the parameter function that you pass to Seq.collect needs to just return the argument:

let flattened = [ [1;2]; [3;4]; [5;6] ] |> Seq.collect (fun x -> x)

This expression fun x -> x is a function that just returns its argument, also known as "identity function".

let flattened = [ [1;2]; [3;4]; [5;6] ] |> Seq.collect id

Its usage crops up so often when working with higher-order functions (such as Seq.collect above) that it deserves a place in the standard library.

Another compelling example is Seq.choose - a function that filters a sequence of Option values and unwraps them at the same time. For example, this is how you might parse all strings as numbers and discard those that can't be parsed:

let tryParse s = match System.Int32.TryParse s with | true, x -> Some x | _ -> None
let strings = [ "1"; "2"; "foo"; "42" ]
let numbers = strings |> Seq.choose tryParse // numbers = [1;2;42]

But what if you're already given a list of Option values to start with? The identity function to the rescue!

let toNumbers optionNumbers =
optionNumbers |> Seq.choose id

How unique is Python's id()?

Yes, CPython re-uses id() values. Do not count on these being unique in a Python program.

This is clearly documented:

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

Bold emphasis mine. The id is unique only as long as an object is alive. Objects that have no references left to them are removed from memory, allowing the id() value to be re-used for another object, hence the non-overlapping lifetimes wording.

Note that this applies to CPython only, the standard implementation provided by python.org. There are other Python implementations, such as IronPython, Jython and PyPy, that make their own choices about how to implement id(), because they each can make distinct choices on how to handle memory and object lifetimes.

To address your specific questions:

  1. In CPython, id() is the memory address. New objects will be slotted into the next available memory space, so if a specific memory address has enough space to hold the next new object, the memory address will be reused. You can see this in the interpreter when creating new objects that are the same size:

    >>> id(1234)
    4546982768
    >>> id(4321)
    4546982768

    The 1234 literal creates a new integer object, for which id() produces a numeric value. As there are no further references to the int value, it is removed from memory again. But executing the same expression again with a different integer literal, and chances are you'll see the same id() value (a garbage collection run breaking cyclic references could free up more memory, so you could also not see the same id() again.

    So it's not random, but in CPython it is a function of the memory allocation algorithms.

  2. If you need to check specific objects, keep your own reference to it. That can be a weakref weak reference if all you need to assure is that the object is still 'alive'.

    For example, recording an object reference first, then later checking it:

    import weakref

    # record
    object_ref = weakref.ref(some_object)

    # check if it's the same object still
    some_other_reference is object_ref() # only true if they are the same object

    The weak reference won't keep the object alive, but if it is alive then the object_ref() will return it (it'll return None otherwise).

    You could use such a mechanism to generate really unique identifiers, see below.

  3. All you have to do to 'destroy' an object is to remove all references to it. Variables (local and global) are references. So are attributes on other objects, and entries in containers such as lists, tuples, dictionaries, sets, etc.

    The moment all references to an object are gone, the reference count on the object drops to 0 and it is deleted, there and then.

    Garbage collection only is needed to break cyclic references, objects that reference one another only, with no further references to the cycle. Because such a cycle will never reach a reference count of 0 without help, the garbage collector periodically checks for such cycles and breaks one of the references to help clear those objects from memory.

    So you can cause any object to be deleted from memory (freed), by removing all references to it. How you achieve that depends on how the object is referenced. You can ask the interpreter to tell you what objects are referencing a given object with the gc.get_referrers() function, but take into account that doesn't give you variable names. It gives you objects, such as the dictionary object that is the __dict__ attribute of a module that references the object as a global, etc. For code fully under your control, at most use gc.get_referrers() as a tool to remind yourself what places the object is referenced from as you write the code to remove those.

If you must have unique identifiers for the lifetime of the Python application, you'd have to implement your own facility. If your objects are hashable and support weak references, then you could just use a WeakKeyDictionary instance to associate arbitrary objects with UUIDs:

from weakref import WeakKeyDictionary
from collections import defaultdict
from uuid import uuid4

class UniqueIdMap(WeakKeyDictionary):
def __init__(self, dict=None):
super().__init__(self)
# replace data with a defaultdict to generate uuids
self.data = defaultdict(uuid4)
if dict is not None:
self.update(dict)

uniqueidmap = UniqueIdMap()

def uniqueid(obj):
"""Produce a unique integer id for the object.

Object must me *hashable*. Id is a UUID and should be unique
across Python invocations.

"""
return uniqueidmap[obj].int

This still produces integers, but as they are UUIDs they are not quite guaranteed to be unique, but the likelihood you'll ever encounter the same ID during your lifetime are smaller than being hit by a meteorite. See How unique is UUID?

This then gives you unique ids even for objects with non-overlapping lifetimes:

>>> class Foo:
... pass
...
>>> id(Foo())
4547149104
>>> id(Foo()) # memory address reused
4547149104
>>> uniqueid(Foo())
151797163173960170410969562162860139237
>>> uniqueid(Foo()) # but you still get a unique UUID
188632072566395632221804340107821543671

Confused about Python’s id()

In general, as soon as you use an integer or a string or any other literal, Python creates a new object in memory for you. It is guaranteed to have the same id for the lifetime of the object, that is, while its reference count is not zero.

When you write something like:

>>> id(1000)
140497411829680

Python creates the integer 1000 and returns its id (the memory address of the object in CPython). After this is done, the reference count of the integer object 1000 is zero and it is deleted. This ensures that you cannot keep filling memory just by writing id(something) (and not binding any variable name to the object).

Typically, you cannot predict when reuse will happen, but in my Python shell it happens quite consistently:

>>> id(1000)
140697307078576
>>> id(1001)
140697307078576
>>> id(1002)
140697307078576
>>> id(1003)
140697307078576

You can see that the same memory address get used again and again when each new integer is created. However, if you prevent the reference count from dropping to zero, you can see that new memory is used instead:

>>> a = 1000
>>> id(a)
140697307078576
>>> b = 1001
>>> id(b)
140697306008368

In CPython, the integers -5 through to 255 are special cases in that they always exist (and so always have the same id during a Python runtime). This is an optimisation to avoid repeated creation and destruction of commonly-used integers.



Related Topics



Leave a reply



Submit