Stored Variable of Self Type (Especially When Subclassing)

Stored variable of Self type (especially when subclassing)

Let me demonstrate why Swift disallows this. If it did allow you to use Self like that, you could in theory do:

let doublyNode = DoublyNode(1)
let linkedNode: LinkedNode<Int> = doublyNode // this assignment should work, right?
linkedNode.next = LinkedNode(2) // "linkedNode" is of type LinkedNode, so Self is LinkedNode, so I can do this assignment, right?

Now what happens? The didSet of next in DoublyNode gets called, and it tries to access previous of LinkedNode(2). The thing is, LinkedNode doesn't even have a previous property! Therefore, allowing you to use Self this way is unsafe.

I don't think DoublyNode should inherit from LinkedNode at all. Alexander's answer explains this very well, namely that this violates the LSP. One thing you could do to relate these two classes, is with a protocol:

protocol LinkedNodeProtocol {
associatedtype Data

var value: Data { get set }
var next: Self? { get set }

init(_ value: Data)
}

final class LinkedNode<Data>: LinkedNodeProtocol {
var value: Data
var next: LinkedNode?

init(_ value: Data) {
self.value = value
}
}

final class DoublyNode<Data>: LinkedNodeProtocol {
var value: Data
weak var previous: DoublyNode?
var next: DoublyNode? {
didSet { next?.previous = self }
}

init(_ value: Data) {
self.value = value
}
}

Enforcing Class Variables in a Subclass

Abstract Base Classes allow to declare a property abstract, which will force all implementing classes to have the property. I am only providing this example for completeness, many pythonistas think your proposed solution is more pythonic.

import abc

class Base(object):
__metaclass__ = abc.ABCMeta

@abc.abstractproperty
def value(self):
return 'Should never get here'

class Implementation1(Base):

@property
def value(self):
return 'concrete property'

class Implementation2(Base):
pass # doesn't have the required property

Trying to instantiate the first implementing class:

print Implementation1()
Out[6]: <__main__.Implementation1 at 0x105c41d90>

Trying to instantiate the second implementing class:

print Implementation2()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-bbaeae6b17a6> in <module>()
----> 1 Implementation2()

TypeError: Can't instantiate abstract class Implementation2 with abstract methods value

How dangerous is setting self.__class__ to something else?

Here's a list of things I can think of that make this dangerous, in rough order from worst to least bad:

  • It's likely to be confusing to someone reading or debugging your code.
  • You won't have gotten the right __init__ method, so you probably won't have all of the instance variables initialized properly (or even at all).
  • The differences between 2.x and 3.x are significant enough that it may be painful to port.
  • There are some edge cases with classmethods, hand-coded descriptors, hooks to the method resolution order, etc., and they're different between classic and new-style classes (and, again, between 2.x and 3.x).
  • If you use __slots__, all of the classes must have identical slots. (And if you have the compatible but different slots, it may appear to work at first but do horrible things…)
  • Special method definitions in new-style classes may not change. (In fact, this will work in practice with all current Python implementations, but it's not documented to work, so…)
  • If you use __new__, things will not work the way you naively expected.
  • If the classes have different metaclasses, things will get even more confusing.

Meanwhile, in many cases where you'd think this is necessary, there are better options:

  • Use a factory to create an instance of the appropriate class dynamically, instead of creating a base instance and then munging it into a derived one.
  • Use __new__ or other mechanisms to hook the construction.
  • Redesign things so you have a single class with some data-driven behavior, instead of abusing inheritance.

As a very most common specific case of the last one, just put all of the "variable methods" into classes whose instances are kept as a data member of the "parent", rather than into subclasses. Instead of changing self.__class__ = OtherSubclass, just do self.member = OtherSubclass(self). If you really need methods to magically change, automatic forwarding (e.g., via __getattr__) is a much more common and pythonic idiom than changing classes on the fly.

How do I declare a variable that contains a subclass of a class which implements an interface?

There is no way in Java to declare the variable the way you would like to do it.

You could use SelectableChannel for the type of the variable (since this is a supertype of both SocketChannel and DatagramChannel), and cast it to a ByteChannel whenever you need to call methods from that interface. Simple example:

class MyClass {
private SelectableChannel channel; // either a SocketChannel or a DatagramChannel

public int readStuff(ByteBuffer buffer) {
// Cast it to a ByteChannel when necessary
return ((ByteChannel) channel).read(buffer);
}
}

(Or the other way around: declare the variable as a ByteChannel and cast to a SelectableChannel when necessary - whichever is more convenient in your case).

Python: self vs type(self) and the proper use of class variables

After speaking with others offline (and per @wwii's comment on one of the answers here), it turns out the best way to do this without embedding the class name explicitly is to use self.__class__.attribute.

(While some people out there use type(self).attribute it causes other problems.)

Why is an instance variable of the superclass not overridden by a subclass?

Why instance variable of a superclass is not overridden in subclass method see my code below ...

Because instance variables CANNOT be overridden in Java. In Java, only methods can be overridden.

When you declare a field with the same name as an existing field in a superclass, the new field hides the existing field. The existing field from the superclass is still present in the subclass, and can even be used ... subject to the normal Java access rules.

(In your example, an instance of C has two distinct fields called a, containing distinct values.)



Because instance variables CANNOT be overridden in Java, but why? why is it done in this manner in Java? What's the reason?

Why did they design it that way?

  1. Because overriding variables would fundamentally break code in the superclass. For example, if an override changes the variable's type, that is likely to change the behavior of methods declared in the parent class that used the original variable. At worst, it renders them uncompilable.

    For example:

       public class Sup {
    private int foo;
    public int getFoo() {
    return foo;
    }
    }

    public class Sub extends Sup {
    private int[] foo;
    ...
    }

    If Sub.foo overrides (i.e. replaces) Sup.foo, how can getFoo() work? In the subclass context, it would be trying to return a value of a field of the wrong type!

  2. If fields that were overridden were not private, it would be even worse. That would break the Liskov Substitutability Principle (LSP) in a pretty fundamental way. That removes the basis for polymorphism.

  3. On the flipside, overriding fields would not achieve anything that cannot be done better in other ways. For example, a good design declares all instance variables as private and provides getters/setters for them as required. The getters/setters can be overridden, and the parent class can "protect" itself against undesirable overrides by using the private fields directly, or declaring the getters/setters final.


References:

  • Java Tutorial - Hiding Fields
  • JLS Example 8.3.1.1-3 - Hiding of Instance Fields.

Why is it possible to define an variable with type of a superclass, but assign it an object of a subclass?

It seems to me you are puzzled by two facets of this. One has to do with space allocation, which a number of other answers have addressed. To sum up, declaring a reference in an object-oriented system doesn't allocate space to hold an object of that type, it simply allocates space to hold a pointer.

The other issue you seem to be confused about is what typing actually means in an object-oriented system.

I would understand the other way: Big
fat NSMutableArray on left side, and
tiny NSArray und the right.

Think of any class as comprising two separate things: an interface, which is the set of methods and members that it exposes, and an implementation, which determines what accesses to those methods and member will actually do.

When you declare a reference to NSArray, it means that the reference will point to objects that support the same interface as NSArray. Because NSMutableArray is a subclass of NSArray, instances of NSMutableArray will always support NSArray's interface. Therefore the assignment is safe.

Conversely, if you declare a reference to NSMutableArray, it must point to objects that support the subclass's interface. Therefore, you cannot assign a pointer to an NSArray instance to that variable, because NSArray doesn't support the full interface of NSMutableArray. (In a weakly typed system, it is possible that you can do the assignment but then get a runtime error if you try to invoke a method that exists in the interface but not in the instantiated object. I don't know how strongly typed Objective C is.)

When and why to use self.__dict__ instead of self.variable

Almost all of the time, you shouldn't use self.__dict__.

If you're accessing an attribute like self.client, i.e. the attribute name is known and fixed, then the only difference between that and self.__dict__['client'] is that the latter won't look up the attribute on the class if it's missing on the instance. There is very rarely any reason to do this, but the difference is demonstrated below:

>>> class A:
... b = 3 # class attribute, not an instance attribute
...
>>> A.b # the class has this attribute
3
>>> a = A()
>>> a.b # the instance doesn't have this attribute, fallback to the class
3
>>> a.__dict__['b'] # the instance doesn't have this attribute, but no fallback
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'b'

The main use-case for self.__dict__ is when you don't want to access a fixed, known attribute name. In almost all code, you always know which attribute you want to access; and if you do need to look something up dynamically using an unknown string, you should create a dictionary yourself, and write self.that_dict[key] instead of self.__dict__[key].

So the only times you should really use __dict__ is when you are writing code which needs to work regardless of which attributes the instance might have; i.e. you specifically want code which will work even if you change the class's structure or its attribute names, or code which will work across multiple classes with different structures. I'll show one example below.

The __repr__ method

The __repr__ method is meant to return a string representing the instance, for the programmer's convenience when using a REPL. For debugging/testing purposes this string usually contains information about the object's state. Here's a common way to implement it:

class Foo:
def __init__(self, foo, bar, baz):
self.foo = foo
self.bar = bar
self.baz = baz

def __repr__(self):
return 'Foo({!r}, {!r}, {!r})'.format(self.foo, self.bar, self.baz)

This means if you write obj = Foo(1, 'y', True) to create an instance, then repr(obj) will be the string "Foo(1, 'y', True)", which is convenient because it shows the instance's entire state, and also the string itself is Python code which creates an instance with the same state.

But there are a few issues with the above implementation: we have to change it if the class's attributes change, it won't give useful results for instances of subclasses, and we have to write lots of similar code for different classes with different attributes. If we use __dict__ instead, we can solve all of those problems:

    def __repr__(self):
return '{}({})'.format(
self.__class__.__name__,
', '.join('{}={!r}'.format(k, v) for k, v in self.__dict__.items())
)

Now repr(obj) will be Foo(foo=1, bar='y', baz=True), which also shows the instance's entire state, and is also executable Python code. This generalised __repr__ method will still work if the structure of Foo changes, it can be shared between multiple classes via inheritance, and it returns executable Python code for any class whose attributes are accepted as keyword arguments by __init__.

Python how to type anotate a method that returns self?

After a lot of research and expirimentation, I have found a way that works in mypy, though Pycham still guesses the type wrong sometimes.

The trick is to make self a type var:

from __future__ import annotations

import asyncio
from typing import TypeVar

T = TypeVar('T')

class M:
def set_width(self: T, width: int)->T:
self.width = width
return self

def set_height(self: T, height: int)->T:
self.height = height
return self

def copy(self)->M:
return M().set_width(self.width).set_height(self.height)

class M3D(M):
def set_depth(self: T, depth: int) -> T:
self.depth = depth
return self

box = M().set_width(5).set_height(10) # box has correct type
cube = M3D().set_width(2).set_height(3).set_depth(5) # cube has correct type
attemptToTreatBoxAsCube = M3D().copy().set_depth(4) # Mypy gets angry as expected

The last line specifically works fine in mypy but pycharm will still autocomplete set_depth sometimes even though .copy() actually returns an M even when called on a M3D.



Related Topics



Leave a reply



Submit