Subclassing Int in Python

Subclassing int in Python

int is immutable so you can't modify it after it is created, use __new__ instead

class TestClass(int):
def __new__(cls, *args, **kwargs):
return super(TestClass, cls).__new__(cls, 5)

print TestClass()

python: subclass of subclass of int

Comments have supplied info that has helped answer Why the difference? and What's the best way?


First: Why the difference?

In the original definitions of uint16 and Offset16, the __new__ method uses super(cls,cls). As @juanpa.arrivillaga pointed out, when Offset16.__new__ is call it leads to uint16.__new__ calling itself recursively. By having Offset16.__new__ use super(uint16,cls), it changes the behaviour inside uint16.__new__.

Some additional explanation may help to understand:

The cls argument passed into Offset16.__new__ is the Offset16 class itself. So, when the implementation of the method refers to cls, that is a reference to Offset16. So,

    return super(cls, cls).__new__(cls, val)

is equivalent in that case to

    return super(Offset16, Offset16).__new__(Offset16, val)

Now we might think of super as returning the base class, but its semantics when arguments are provided is more subtle: super is resolving a reference to a method and the arguments affect how that resolution happens. If no arguments are provided, super().__new__ is the method in the immediate superclass. When arguments are provided, that affects the search. In particular for super(type1, type2), the MRO (method resolution order) of type2 will be searched for an occurrence of type1, and the class following type1 in that sequence will be used.

(This is explained in the documentation of super, though the wording could be clearer.)

The MRO for Offset16 is (Offset16, uint16, int, object). Therefore

    return super(Offset16, Offset16).__new__(Offset16, val)

resolves to

    return uint16.__new__(Offset16, val)

When uint16.__new__ is called in this way, the class argument passed to it is Ofset16, not uint16. As a result, when its implementation has

    return super(cls, cls).__new__(cls, val)

that once again will resolve to

    return uint16.__new__(Offset16, val)

This is why we end up with an infinite loop.

But in the changed definition of Offset16,

class Offset16(uint16):

def __new__(cls, val):
return super(uint16, cls).__new__(cls, val)

the last line is equivalent to

        return super(uint16, Offset16).__new__(Offset16, val)

and per the MRO for Offset16 and the semantics for super mentioned above, that resolves to

        return int.__new__(Offset16, val)

That explains why the changed definition results in a different behaviour.


Second: What's the best way to do this?

Different alternatives were provided in comments that might fit different situations.

@juanpa.arrivillaga suggested (assuming Python3) simply using super() without arguments. For the approach that was being taken in the question, this makes sense. The reason for passing arguments to super would be to manipulate the MRO search. In this simple class hierarchy, that's not needed.

@Jason Yang suggested referring directly to the specific superclass rather than using super. For instance:

class Offset16(uint16):

def __new__(cls, val):
return uint16.__new__(cls, val)

That is perfectly fine for this simple situation. But it might not be the best for other scenarios with more complex class relationships. Note, for instance, that uint16 is duplicated in the above. If the subclass had several methods that wrapped (rather than replaced) the superclass method, there would be many duplicate references, and making changes to the class hierarchy would result in hard-to-analyze bugs. Avoiding such problems is one of the intended benefits for using super.

Finally, @Adam.Er8 suggested simply using

Offset16 = uint16

That's very simple, indeed. The one caveat is that Offset16 is truly no more than an alias for uint16; it's not a separate class. For example:

>>> Offset16 = uint16
>>> x = Offset16(24)
>>> type(x)
<class 'uint16'>

So, this may be fine so long as there's never a need in the app to have an actual type distinction.

Subclassing int - unexpected behaviour with range

Don't subclass int and then override all of the methods. If you do that, the base class will think you have one value, and the subclass will think you have a different value. Instead, subclass numbers.Integral and implement all of the abstract methods. Then you can be sure your implementation is the only game in town.

How to subclass int and make it mutable

Is it possible to subclass int and make it mutable?

Sort of. You can add all the mutable parts you want, but you can't touch the int parts, so the degree of mutability you can add won't help you.

Instead, don't use an int subclass. Use a regular object that stores an int. If you want to be able to pass it to struct.pack like an int, implement an __index__ method to define how to interpret your object as an int:

class IntLike(object): # not IntLike(int):
def __init__(self, value=0):
self.value = value
def __index__(self):
return self.value
...

You can implement additional methods like __or__ for | and __ior__ for in-place, mutative |=. Don't try to push too hard for complete interoperability with ints, though; for example, don't try to make your objects usable as dict keys. They're mutable, after all.

If it's really important to you that your class is an int subclass, you're going to have to sacrifice the c.sixth_property = True syntax you want. You'll have to pick an alternative like c = c.with_sixth_property(True), and implement things non-mutatively.

Automatic counter as a subclass of integer?

If you really, really, really need to mangle an immutable and built-in type, then you can create a kind-of "pointer" to it:

class AutomaticCounter(int):
def __new__(cls, *args, **kwargs):
# create a new instance of int()
self = super().__new__(cls, *args, **kwargs)
# create a member "ptr" and storing a ref to the instance
self.ptr = self
# return the normal instance
return self

def __str__(self):
# first, create a copy via int()
# which "decays" from your subclass to an ordinary int()
# then stringify it to obtain the normal __str__() value
value = str(int(self.ptr))

# afterwards, store a new instance of your subclass
# that is incremented by 1
self.ptr = AutomaticCounter(self.ptr + 1)
return value

n = AutomaticCounter(0)
print(n) # 0
print(n) # 1
print(n) # 2

# to increment first and print later, use this __str__() instead:
def __str__(self):
self.ptr = AutomaticCounter(self.ptr + 1)
return str(int(self.ptr))

This, however, doesn't make the type immutable per se. If you do print(f"{self=}") at the beginning of __str__() you'll see the instance is unchanged, so you effectively have a size of 2x int() (+ some trash) for your object and you access the real instance via self.ptr.

It wouldn't work with self alone as self is merely a read-only reference (created via __new__()) passed to instance's methods as the first argument, so something like this:

def func(instance, ...):
instance = <something else>

and you doing the assignment would, as mentioned by Daniel, simply assign a new value to the local variable named instance (self is just a quasi-standard name for the reference) which doesn't really change the instance. Therefore the next solution which looks similar would be a pointer and as you'd like to manipulate it the way you described, I "hid" it to a custom member called ptr.


As pointed out by user2357112, there is a desynchronization caused by the instance being immutable, therefore if you choose the self.ptr hack, you'll need to update the magic methods (__*__()), for example this is updating the __add__(). Notice the int() calls, it converts it to int() to prevent recursions.

class AutomaticCounter(int):
def __new__(cls, *args, **kwargs):
self = super().__new__(cls, *args, **kwargs)
self.ptr = self
return self

def __str__(self):
value = int(self.ptr)
self.ptr = AutomaticCounter(int(self.ptr) + 1)
return str(value)

def __add__(self, other):
value = other
if hasattr(other, "ptr"):
value = int(other.ptr)
self.ptr = AutomaticCounter(int(self.ptr) + value)
return int(self.ptr)

def __rmul__(self, other):
# [1, 2, 3] * your_object
return other * int(self.ptr)

n = AutomaticCounter(0)
print(n) # 0
print(n) # 1
print(n) # 2
print(n+n) # 6

However, anything that attempts to pull the raw value or tries to access it with C API will most likely fail, namely reverse operations e.g. with immutable built-ins should be the case as for those you can't edit the magic methods reliably so it's corrected in all modules and all scopes.

Example:

# will work fine because it's your class
a <operator> b -> a.__operator__(b)
vs
# will break everything because it's using the raw value, not self.ptr hack
b <operator> a -> b.__operator__(a)

with exception of list.__mul__() for some reason. When I find the code line in CPython, I'll add it here.


Or, a more sane solution would be to create a custom and mutable object, create a member in it and manipulate that. Then return it, stringified, in __str__:

class AutomaticCounter(int):
def __init__(self, start=0):
self.item = start
def __str__(self):
self.item += 1
return str(self.item)


Related Topics



Leave a reply



Submit