Why Is Python 3.X's Super() Magic

Why is Python 3.x's super() magic?

The new magic super() behaviour was added to avoid violating the D.R.Y. (Don't Repeat Yourself) principle, see PEP 3135. Having to explicitly name the class by referencing it as a global is also prone to the same rebinding issues you discovered with super() itself:

class Foo(Bar):
    def baz(self):
        return super(Foo, self).baz() + 42

Spam = Foo
Foo = something_else()

Spam().baz()  # liable to blow up

The same applies to using class decorators where the decorator returns a new object, which rebinds the class name:

@class_decorator_returning_new_class
class Foo(Bar):
    def baz(self):
        # Now `Foo` is a *different class*
        return super(Foo, self).baz() + 42

The magic super() __class__ cell sidesteps these issues nicely by giving you access to the original class object.

The PEP was kicked off by Guido, who initially envisioned super becoming a keyword, and the idea of using a cell to look up the current class was also his. Certainly, the idea to make it a keyword was part of the first draft of the PEP.

However, it was in fact Guido himself who then stepped away from the keyword idea as 'too magical', proposing the current implementation instead. He anticipated that using a different name for super() could be a problem:

My patch uses an intermediate solution: it assumes you need __class__
whenever you use a variable named 'super'. Thus, if you (globally)
rename super to supper and use supper but not super, it won't work
without arguments (but it will still work if you pass it either
__class__ or the actual class object); if you have an unrelated
variable named super, things will work but the method will use the
slightly slower call path used for cell variables.

So, in the end, it was Guido himself that proclaimed that using a super keyword did not feel right, and that providing a magic __class__ cell was an acceptable compromise.

I agree that the magic, implicit behaviour of the implementation is somewhat surprising, but super() is one of the most mis-applied functions in the language. Just take a look at all the misapplied super(type(self), self) or super(self.__class__, self) invocations found on the Internet; if any of that code was ever called from a derived class you'd end up with an infinite recursion exception. At the very least the simplified super() call, without arguments, avoids that problem.

As for the renamed super_; just reference __class__ in your method as well and it'll work again. The cell is created if you reference either the super or __class__ names in your method:

>>> super_ = super
>>> class A(object):
...     def x(self):
...         print("No flipping")
... 
>>> class B(A):
...     def x(self):
...         __class__  # just referencing it is enough
...         super_().x()
... 
>>> B().x()
No flipping

Python super() arguments: why not super(obj)?

The two-argument form is only needed in Python 2. The reason is that self.__class__ always refers to the "leaf" class in the inheritance tree -- that is, the most specific class of the object -- but when you call super you need to tell it which implementation is currently being invoked, so it can invoke the next one in the inheritance tree.

Suppose you have:

class A(object):
   def foo(self):
      pass

class B(A):
   def foo(self):
      super(self.__class__, self).foo()

class C(B):
   def foo(self):
      super(self.__class__, self).foo()

c = C()

Note that c.__class__ is C, always. Now think about what happens if you call c.foo().

When you call super(self.__class__, self) in a method of C, it will be like calling super(C, self), which means "call the version of this method inherited by C". That will call B.foo, which is fine. But when you call super(self.__class__, self) from B, it's still like calling super(C, self), because it's the same self, so self.__class__ is still C. The result is that the call in B will again call B.foo and an infinite recursion occurs.

Of course, what you really want is to be able to call super(classThatDefinedTheImplementationThatIsCurrentlyExecuting, self), and that is effectively what the Python 3 super() does.

In Python 3, you can just do super().foo() and it does the right thing. It's not clear to me what you mean about super(self) being a shortcut. In Python 2, it doesn't work for the reason I described above. In Python 3, it would be a "longcut" because you can just use plain super() instead.

The super(type) and super(type1, type2) uses might still be needed occasionally in Python 3, but those were always more esoteric usages for unusual situations.

Why does a classmethod's super need a second argument?

super() returns a descriptor, and needs two items:

A starting point from which to search the class hierarchy.
The argument to bind the returned methods.

For the two argument (and implicit zero-argument ^*) case the second argument is used to bind to, but if you do not pass in a second argument, super() cannot invoke the descriptor protocol to bind the returned functions, classmethods, properties or other descriptors. classmethods are still descriptors and are bound; the bind to a class and not an instance, but super() does not know how the descriptor will use the context to which you bind.

super() should not and cannot know that you are looking up a class method instead of a regular method; class methods only differ from regular methods because their .__get__() method acts differently.

Why are class methods bound? Because when you subclass Foo but do not override .hello(), calling Bar.hello() invokes the Foo.__dict__['hello'] function, binds it to Bar and your first argument to hello(cls) will be that subclass, not Foo.

Without a second argument, super() returns an unbound object that can manually be bound later on. You can do the binding yourself using the .__get__() method provided by the super() instance:

class Bar(Foo):
    @classmethod
    def hello(cls):
        print 'hello, bar'
        super(Bar).__get__(cls, None).hello()

super().__get__() on an instance without a context effectively returns a new super() instance with the context set. On an instance with a context .__get__() just returns self; it is already bound.

^* In Python 3, calling super() without arguments from inside a bound method will use the calling frame to discover, implicitly, what the type and bound object are, so you no longer have to explicitly pass in the type and object arguments in that case. Python 3 actually adds a implicit __class__ closure variable to methods for this purpose. See PEP 3135 and Why is Python 3.x's super() magic?

Does any magic happen when I call `super(some_cls)`?

In both cases, super(A) gives an unbound super object. When you call __init__() on that, it's being called with no arguments. When super.__init__ is called with no arguments, the compiler tries to infer the arguments: (from typeobject.c line 7434, latest source)

static int
super_init(PyObject *self, PyObject *args, PyObject *kwds)
{
    superobject *su = (superobject *)self;
    PyTypeObject *type = NULL;
    PyObject *obj = NULL;
    PyTypeObject *obj_type = NULL;

    if (!_PyArg_NoKeywords("super", kwds))
        return -1;
    if (!PyArg_ParseTuple(args, "|O!O:super", &PyType_Type, &type, &obj))
        return -1;

    if (type == NULL) {
        /* Call super(), without args -- fill in from __class__
           and first local variable on the stack. */

A few lines later: (ibid, line 7465)

    f = PyThreadState_GET()->frame;
...
    co = f->f_code;
...
    if (co->co_argcount == 0) {
        PyErr_SetString(PyExc_RuntimeError,
                        "super(): no arguments");
        return -1;
    }

When you call super(A), this inferring behavior is bypassed because type is not None. When you then call __init__() on the unbound super - because it isn't bound, this __init__ call isn't proxied - the type argument is None and the compiler attempts to infer. Inside the class definition, the self argument is present and is used for this purpose. Outside, no arguments are available, so the exception is raised.

In other words, super(A) is not behaving differently depending on where it is called - it's super.__init__() that's behaving differently, and that's exactly what the documentation suggests.

How is super() in Python 3 implemented?

How is super() implemented? Here's the code for python3.3:

/* Cooperative 'super' */

typedef struct {
    PyObject_HEAD
    PyTypeObject *type;
    PyObject *obj;
    PyTypeObject *obj_type;
} superobject;

static PyMemberDef super_members[] = {
    {"__thisclass__", T_OBJECT, offsetof(superobject, type), READONLY,
     "the class invoking super()"},
    {"__self__",  T_OBJECT, offsetof(superobject, obj), READONLY,
     "the instance invoking super(); may be None"},
    {"__self_class__", T_OBJECT, offsetof(superobject, obj_type), READONLY,
     "the type of the instance invoking super(); may be None"},
    {0}
};

static void
super_dealloc(PyObject *self)
{
    superobject *su = (superobject *)self;

    _PyObject_GC_UNTRACK(self);
    Py_XDECREF(su->obj);
    Py_XDECREF(su->type);
    Py_XDECREF(su->obj_type);
    Py_TYPE(self)->tp_free(self);
}

static PyObject *
super_repr(PyObject *self)
{
    superobject *su = (superobject *)self;

    if (su->obj_type)
        return PyUnicode_FromFormat(
            "<super: <class '%s'>, <%s object>>",
            su->type ? su->type->tp_name : "NULL",
            su->obj_type->tp_name);
    else
        return PyUnicode_FromFormat(
            "<super: <class '%s'>, NULL>",
            su->type ? su->type->tp_name : "NULL");
}

static PyObject *
super_getattro(PyObject *self, PyObject *name)
{
    superobject *su = (superobject *)self;
    int skip = su->obj_type == NULL;

    if (!skip) {
        /* We want __class__ to return the class of the super object
           (i.e. super, or a subclass), not the class of su->obj. */
        skip = (PyUnicode_Check(name) &&
            PyUnicode_GET_LENGTH(name) == 9 &&
            PyUnicode_CompareWithASCIIString(name, "__class__") == 0);
    }

    if (!skip) {
        PyObject *mro, *res, *tmp, *dict;
        PyTypeObject *starttype;
        descrgetfunc f;
        Py_ssize_t i, n;

        starttype = su->obj_type;
        mro = starttype->tp_mro;

        if (mro == NULL)
            n = 0;
        else {
            assert(PyTuple_Check(mro));
            n = PyTuple_GET_SIZE(mro);
        }
        for (i = 0; i < n; i++) {
            if ((PyObject *)(su->type) == PyTuple_GET_ITEM(mro, i))
                break;
        }
        i++;
        res = NULL;
        /* keep a strong reference to mro because starttype->tp_mro can be
           replaced during PyDict_GetItem(dict, name)  */
        Py_INCREF(mro);
        for (; i < n; i++) {
            tmp = PyTuple_GET_ITEM(mro, i);
            if (PyType_Check(tmp))
                dict = ((PyTypeObject *)tmp)->tp_dict;
            else
                continue;
            res = PyDict_GetItem(dict, name);
            if (res != NULL) {
                Py_INCREF(res);
                f = Py_TYPE(res)->tp_descr_get;
                if (f != NULL) {
                    tmp = f(res,
                        /* Only pass 'obj' param if
                           this is instance-mode super
                           (See SF ID #743627)
                        */
                        (su->obj == (PyObject *)
                                    su->obj_type
                            ? (PyObject *)NULL
                            : su->obj),
                        (PyObject *)starttype);
                    Py_DECREF(res);
                    res = tmp;
                }
                Py_DECREF(mro);
                return res;
            }
        }
        Py_DECREF(mro);
    }
    return PyObject_GenericGetAttr(self, name);
}

static PyTypeObject *
supercheck(PyTypeObject *type, PyObject *obj)
{
    /* Check that a super() call makes sense.  Return a type object.

       obj can be a class, or an instance of one:

       - If it is a class, it must be a subclass of 'type'.      This case is
         used for class methods; the return value is obj.

       - If it is an instance, it must be an instance of 'type'.  This is
         the normal case; the return value is obj.__class__.

       But... when obj is an instance, we want to allow for the case where
       Py_TYPE(obj) is not a subclass of type, but obj.__class__ is!
       This will allow using super() with a proxy for obj.
    */

    /* Check for first bullet above (special case) */
    if (PyType_Check(obj) && PyType_IsSubtype((PyTypeObject *)obj, type)) {
        Py_INCREF(obj);
        return (PyTypeObject *)obj;
    }

    /* Normal case */
    if (PyType_IsSubtype(Py_TYPE(obj), type)) {
        Py_INCREF(Py_TYPE(obj));
        return Py_TYPE(obj);
    }
    else {
        /* Try the slow way */
        PyObject *class_attr;

        class_attr = _PyObject_GetAttrId(obj, &PyId___class__);
        if (class_attr != NULL &&
            PyType_Check(class_attr) &&
            (PyTypeObject *)class_attr != Py_TYPE(obj))
        {
            int ok = PyType_IsSubtype(
                (PyTypeObject *)class_attr, type);
            if (ok)
                return (PyTypeObject *)class_attr;
        }

        if (class_attr == NULL)
            PyErr_Clear();
        else
            Py_DECREF(class_attr);
    }

    PyErr_SetString(PyExc_TypeError,
                    "super(type, obj): "
                    "obj must be an instance or subtype of type");
    return NULL;
}

static PyObject *
super_descr_get(PyObject *self, PyObject *obj, PyObject *type)
{
    superobject *su = (superobject *)self;
    superobject *newobj;

    if (obj == NULL || obj == Py_None || su->obj != NULL) {
        /* Not binding to an object, or already bound */
        Py_INCREF(self);
        return self;
    }
    if (Py_TYPE(su) != &PySuper_Type)
        /* If su is an instance of a (strict) subclass of super,
           call its type */
        return PyObject_CallFunctionObjArgs((PyObject *)Py_TYPE(su),
                                            su->type, obj, NULL);
    else {
        /* Inline the common case */
        PyTypeObject *obj_type = supercheck(su->type, obj);
        if (obj_type == NULL)
            return NULL;
        newobj = (superobject *)PySuper_Type.tp_new(&PySuper_Type,
                                                 NULL, NULL);
        if (newobj == NULL)
            return NULL;
        Py_INCREF(su->type);
        Py_INCREF(obj);
        newobj->type = su->type;
        newobj->obj = obj;
        newobj->obj_type = obj_type;
        return (PyObject *)newobj;
    }
}

static int
super_init(PyObject *self, PyObject *args, PyObject *kwds)
{
    superobject *su = (superobject *)self;
    PyTypeObject *type = NULL;
    PyObject *obj = NULL;
    PyTypeObject *obj_type = NULL;

    if (!_PyArg_NoKeywords("super", kwds))
        return -1;
    if (!PyArg_ParseTuple(args, "|O!O:super", &PyType_Type, &type, &obj))
        return -1;

    if (type == NULL) {
        /* Call super(), without args -- fill in from __class__
           and first local variable on the stack. */
        PyFrameObject *f = PyThreadState_GET()->frame;
        PyCodeObject *co = f->f_code;
        Py_ssize_t i, n;
        if (co == NULL) {
            PyErr_SetString(PyExc_SystemError,
                            "super(): no code object");
            return -1;
        }
        if (co->co_argcount == 0) {
            PyErr_SetString(PyExc_SystemError,
                            "super(): no arguments");
            return -1;
        }
        obj = f->f_localsplus[0];
        if (obj == NULL) {
            PyErr_SetString(PyExc_SystemError,
                            "super(): arg[0] deleted");
            return -1;
        }
        if (co->co_freevars == NULL)
            n = 0;
        else {
            assert(PyTuple_Check(co->co_freevars));
            n = PyTuple_GET_SIZE(co->co_freevars);
        }
        for (i = 0; i < n; i++) {
            PyObject *name = PyTuple_GET_ITEM(co->co_freevars, i);
            assert(PyUnicode_Check(name));
            if (!PyUnicode_CompareWithASCIIString(name,
                                                  "__class__")) {
                Py_ssize_t index = co->co_nlocals +
                    PyTuple_GET_SIZE(co->co_cellvars) + i;
                PyObject *cell = f->f_localsplus[index];
                if (cell == NULL || !PyCell_Check(cell)) {
                    PyErr_SetString(PyExc_SystemError,
                      "super(): bad __class__ cell");
                    return -1;
                }
                type = (PyTypeObject *) PyCell_GET(cell);
                if (type == NULL) {
                    PyErr_SetString(PyExc_SystemError,
                      "super(): empty __class__ cell");
                    return -1;
                }
                if (!PyType_Check(type)) {
                    PyErr_Format(PyExc_SystemError,
                      "super(): __class__ is not a type (%s)",
                      Py_TYPE(type)->tp_name);
                    return -1;
                }
                break;
            }
        }
        if (type == NULL) {
            PyErr_SetString(PyExc_SystemError,
                            "super(): __class__ cell not found");
            return -1;
        }
    }

    if (obj == Py_None)
        obj = NULL;
    if (obj != NULL) {
        obj_type = supercheck(type, obj);
        if (obj_type == NULL)
            return -1;
        Py_INCREF(obj);
    }
    Py_INCREF(type);
    su->type = type;
    su->obj = obj;
    su->obj_type = obj_type;
    return 0;
}

PyDoc_STRVAR(super_doc,
"super() -> same as super(__class__, <first argument>)\n"
"super(type) -> unbound super object\n"
"super(type, obj) -> bound super object; requires isinstance(obj, type)\n"
"super(type, type2) -> bound super object; requires issubclass(type2, type)\n"
"Typical use to call a cooperative superclass method:\n"
"class C(B):\n"
"    def meth(self, arg):\n"
"        super().meth(arg)\n"
"This works for class methods too:\n"
"class C(B):\n"
"    @classmethod\n"
"    def cmeth(cls, arg):\n"
"        super().cmeth(arg)\n");

static int
super_traverse(PyObject *self, visitproc visit, void *arg)
{
    superobject *su = (superobject *)self;

    Py_VISIT(su->obj);
    Py_VISIT(su->type);
    Py_VISIT(su->obj_type);

    return 0;
}

PyTypeObject PySuper_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "super",                                    /* tp_name */
    sizeof(superobject),                        /* tp_basicsize */
    0,                                          /* tp_itemsize */
    /* methods */
    super_dealloc,                              /* tp_dealloc */
    0,                                          /* tp_print */
    0,                                          /* tp_getattr */
    0,                                          /* tp_setattr */
    0,                                          /* tp_reserved */
    super_repr,                                 /* tp_repr */
    0,                                          /* tp_as_number */
    0,                                          /* tp_as_sequence */
    0,                                          /* tp_as_mapping */
    0,                                          /* tp_hash */
    0,                                          /* tp_call */
    0,                                          /* tp_str */
    super_getattro,                             /* tp_getattro */
    0,                                          /* tp_setattro */
    0,                                          /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC |
        Py_TPFLAGS_BASETYPE,                    /* tp_flags */
    super_doc,                                  /* tp_doc */
    super_traverse,                             /* tp_traverse */
    0,                                          /* tp_clear */
    0,                                          /* tp_richcompare */
    0,                                          /* tp_weaklistoffset */
    0,                                          /* tp_iter */
    0,                                          /* tp_iternext */
    0,                                          /* tp_methods */
    super_members,                              /* tp_members */
    0,                                          /* tp_getset */
    0,                                          /* tp_base */
    0,                                          /* tp_dict */
    super_descr_get,                            /* tp_descr_get */
    0,                                          /* tp_descr_set */
    0,                                          /* tp_dictoffset */
    super_init,                                 /* tp_init */
    PyType_GenericAlloc,                        /* tp_alloc */
    PyType_GenericNew,                          /* tp_new */
    PyObject_GC_Del,                            /* tp_free */
};

You can see in the super_init at some point there is the check type == NULL and then it raises the error that you see. It is not normal to have NULL s around, so there's probably a bug somewhere in super(and note that super already had bugs in previous releases). At least I'd thought that the cases in which SystemError is raised should be triggered only due to some "internal" failure of the interpreter or some other C code and not from python code.

Also, this did not happen only to you, you can find a post in which this behaviour is considered a bug.

Why do you need to call super class inside constructor?

Your class inherits from beam.DoFn. Presumably that class needs to set up some things in its __init__ method, or it won't work properly. Thus, if you override __init__, you need to call the parent class's __init__ or your instance may not function as intended.

I'd note that your current super call is actually subtly buggy. It's not appropriate to use self.__class__ as the first argument to super. You either need to write out the name of the current class explicitly, or not pass any arguments at all (the no-argument form of super is only valid in Python 3). Using self.__class__ might work for now, but it will break if you subclass PublishFn any further, and override __init__ again in the grandchild class.

an example about C3

First of all, the form super() in Python 3 is really the same thing as super(<CurrentClass>, self), where the Python compiler provides enough information for super() to determine what the correct class to use is. So in E.foo(), super().foo() can be read as super(E, self).foo().

To understand what is going on, you need to look at the class.__mro__ attribute:

This attribute is a tuple of classes that are considered when looking for base classes during method resolution.

It is this tuple that shows you what the C3 Method Resolution Order is for any given class hierarchy. For your class E, that order is:

>>> E.__mro__
(<class '__main__.E'>, <class '__main__.D'>, <class '__main__.B'>, <class '__main__.C'>, <class '__main__.A'>, <class 'object'>)
>>> for cls in E.__mro__:  # print out just the names, for easier readability.
...     print(cls.__name__)
...
E
D
B
C
A
object

The super() object bases everything off from that ordered sequence of classes. The call

super(SomeClass, self).foo()

results in the following series of steps:

The super() object retrieves the self.__mro__ tuple.
super() locates the index for the SomeClass class in that tuple.
Accessing the foo attribute on the super() object triggers a search for a class that has a foo attribute on the MRO, starting at the next index after the SomeClass index.
If the attribute found this way is a descriptor object binds the attribute found this way to self. Functions are descriptors, binding produces a bound method, and this is how Python passes in the self reference when you call a method.

Expressed as simplified Python code that ignores edge cases and other uses for super(), that would look like:

class Super:
    def __init__(self, type_, obj_or_type):
        self.mro = obj_or_type.__mro__
        self.idx = self.mro.index(type_) + 1
        self.obj_or_type = obj_or_type
    def __getattr__(self, name):
        for cls in self.mro[self.idx:]:
            attrs = vars(cls)
            if name in attrs:
                result = attrs[name]
                if hasattr(result, '__get__'):
                    result = result.__get__(obj_or_type, type(self.obj_or_type))
                return result
        raise AttributeError(name)

Combining those two pieces of information, you can see what happens when you call e.foo():

print('foo in E') is executed, resulting in foo in E
super().foo() is executed, effectively the same thing as super(E, self).foo().
- The MRO is searched, starting at the next index past E, so at D (no foo attribute), moving on to B (no foo attribute), then C (attribute found). C.foo is returned, bound to self.
- C.foo(self) is called, resulting in foo fo C
super(B, self).foo() is executed.
- The MRO is searched, starting at the next index past B, so at C (attribute found). C.foo is returned, bound to self.
- C.foo(self) is called, resulting in foo fo C
super(C, self).foo() is executed.
- The MRO is searched, starting at the next index past C, so at A (attribute found). A.foo is returned, bound to self.
- A.foo(self) is called, resulting in foo of A

Why Is Python 3.X's Super() Magic