Python Method Name with Double-Underscore Is Overridden

Python method name with double-underscore is overridden?

keywords with a pattern of __* are class private names.

http://docs.python.org/reference/lexical_analysis.html#reserved-classes-of-identifiers

Quoting:

Names in this category, when used within the context of a class definition, are re-written to use a mangled form to help avoid name clashes between “private” attributes of base and derived classes

Private name mangling (emphasis added):

Private name mangling: When an identifier that textually occurs in a class definition begins with two or more underscore characters and does not end in two or more underscores, it is considered a private name of that class. Private names are transformed to a longer form before code is generated for them. The transformation inserts the class name in front of the name, with leading underscores removed, and a single underscore inserted in front of the class name. For example, the identifier __spam occurring in a class named Ham will be transformed to _Ham__spam. This transformation is independent of the syntactical context in which the identifier is used. If the transformed name is extremely long (longer than 255 characters), implementation defined truncation may happen. If the class name consists only of underscores, no transformation is done.

http://docs.python.org/reference/expressions.html#atom-identifiers

This means that behind the scenes, B.__a() is transformed to something like B._B__a()

How to override a function which starts with two underscores in python?

What you read is the double underscore prefixed methods became private due to name mangling. This process rewrites the names inside the class definition to point to the new name. The new name is constructed as you said: _<class name><class method name>. Consider the following example:

class A():
def public(self): print('public() called')
def _internal_use(self): print('_internal_use() called')
def __private(self): print('__private() called')
def f(self): self.__private()

Now lets take a look at A.__dict__ which is the structure where methods are stored by name:

>>> A.__dict__
mappingproxy({
'f': <function A.f at 0x1028908c8>,
'_A__private': <function A.__private at 0x1028906a8>,
'_internal_use': <function A._internal_use at 0x102890620>,
'__weakref__': <attribute '__weakref__' of 'A' objects>,
'__dict__': <attribute '__dict__' of 'A' objects>,
'__module__': '__main__',
'__doc__': None,
'public': <function A.public at 0x102890598>
})

Among others, notice you have _A__private, _internal_use and public.

Notice these names does not exist in the module scope, they only exists inside the __dict__ of the class. When Python is resolving a member access, it looks inside the __dict__ of the object. If not fount, it looks into the class' __dict__ and super-classes' __dict__.

a = A()
a.public
# Python looks into a.__dict__. If not found, it looks into type(a).__dict__

This way you can access public or _internal_use but you can not find __private because that name does not even exist. What you can access is _A__private:

a.public        # works
a.f # works
a._internal_use # works
a.__private # AttributeError!
a._A__private # works again

Notice none of this names are defined in the module global scope neither:

public         # NameError!
_internal_use # NameError!
__private # NameError!
_A__private # NameError!

But you tried to override the function by simply defining a module function with that name. Well, Python member access resolution will never look into the global scope. So you have a couple of options:

  1. You can create another inheriting class and redefine that function:

    class B(A):
    def _A__private(self): print('New __private() called')

    a = A()
    a.f() # prints __private() called
    b = B()
    b.f() # prints New __private() called
  2. You can override the method directly with any other function (even a lambda):

    A._A__private = lambda self: print('This is a lambda')
    a = A()
    a.f() # prints This is a lambda

What is the meaning of single and double underscore before an object name?

Single Underscore

In a class, names with a leading underscore indicate to other programmers that the attribute or method is intended to be be used inside that class. However, privacy is not enforced in any way.
Using leading underscores for functions in a module indicates it should not be imported from somewhere else.

From the PEP-8 style guide:

_single_leading_underscore: weak "internal use" indicator. E.g. from M import * does not import objects whose name starts with an underscore.

Double Underscore (Name Mangling)

From the Python docs:

Any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) is textually replaced with _classname__spam, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard to the syntactic position of the identifier, so it can be used to define class-private instance and class variables, methods, variables stored in globals, and even variables stored in instances. private to this class on instances of other classes.

And a warning from the same page:

Name mangling is intended to give classes an easy way to define “private” instance variables and methods, without having to worry about instance variables defined by derived classes, or mucking with instance variables by code outside the class. Note that the mangling rules are designed mostly to avoid accidents; it still is possible for a determined soul to access or modify a variable that is considered private.

Example

>>> class MyClass():
... def __init__(self):
... self.__superprivate = "Hello"
... self._semiprivate = ", world!"
...
>>> mc = MyClass()
>>> print mc.__superprivate
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: myClass instance has no attribute '__superprivate'
>>> print mc._semiprivate
, world!
>>> print mc.__dict__
{'_MyClass__superprivate': 'Hello', '_semiprivate': ', world!'}

What does the Python naming convention with single/double underscore before a function mean?

Usually, this kind of design is used in two related (but nearly-opposite) patterns, which I don't know the "design patterns" names for. (I think they both include "engine", and one includes "template", if that helps.)


For the first, the idea is to allow a subclass to override the public catch method to, say, add a bit of extra work before or after the core implementation, but still call the _catch method to do most of the work. For example:

Class Pokemon(object):
def __init__(self, name):
self.name = name

def _catch(self, pokeball):
''' actual implementation here'''
# hundreds of lines of complex code
print(pokeball)
return pokeball

def catch(self, pokeball):
print('Gotta catch em all')
return self._catch(pokeball)

class Pikachu(Pokemon):
def catch(self, pokeball):
print('Pikachu')
return self._catch(pokeball)

This allows Pikachu to override the "non-core" part of the implementation, which is only a few lines, without overriding the "core" part, which is hundreds of lines.

This pattern isn't nearly as common in Python as it is in, say, Java, but it does sometimes make sense.


For the other, the idea is to have the base class break the implementation up into separate pieces, each of which can be overridden by the subclass without having to replace everything else. For example:

class Pokemen(object):
def catch(self, pokeball):
self._wake_up()
if not self._check_ready() return False
try:
return self._catch(pokeball)
except SillyError:
return False
finally:
self.consider_sleeping()

So, why use a leading underscore?

The leading single underscore means "private by convention". For a method name, in particular,* it's a hint to the human reader that something is not part of the API. Anyone who wants to use a Pokemon object should not call _catch on it, because that method is an implementation detail—it may change or even go away in future versions, it may make assumptions about the object's state that aren't guaranteed to always be true, etc. But catch should always be safe to call.

Often this is a good match for something that you'd make a protected method in Java or C++, which is exactly what you'd use for both of these design patterns in those languages, even though it doesn't really mean the same thing.


A leading double underscore (without a trailing double underscore) means something different. In a method name or other attribute, it means the name should be "mangled" so that it's harder for a subclass or superclass to accidentally call it, or override it, when it intended to define and use its own private name instead.

Often, this is a good match for something that you'd make a private method or member in Java or C++, but it's even farther from that than a single underscore is from protected.


* In a few other places, it actually does have a tiny bit more meaning. For example, a module global with a leading underscore will be skipped by from mod import * if you didn't specify an __all__ in mod.

Instance attribute that has a name starting with two underscores is weirdly renamed

What's happening in this code?

The code above seems fine, but has some behaviour that might seem unusual. If we type this in an interactive console:

c = Catalog()
# vars() returns the instance dict of an object,
# showing us the value of all its attributes at this point in time.
vars(c)

Then the result is this:

{'_Catalog__product_names': {}}

That's pretty weird! In our class definition, we didn't give any attribute the name _Catalog__product_names. We named one attribute __product_names, but that attribute appears to have been renamed.

What's going on

This behaviour isn't a bug — it's actually a feature of python known as private name mangling. For all attributes that you define in a class definition, if the attribute name begins with two leading underscores — and does not end with two trailing underscores — then the attribute will be renamed like this. An attribute named __foo in class Bar will be renamed _Bar__foo; an attribute named __spam in class Breakfast will be renamed _Breakfast__spam; etc, etc.

The name mangling only occurs for when you try to access the attribute from outside the class. Methods within the class can still access the attribute using its "private" name that you defined in __init__.

Why would you ever want this?

I personally have never found a use case for this feature, and am somewhat sceptical of it. Its main use cases are for situations where you want a method or an attribute to be privately accessible within a class, but not accessible by the same name to functions outside of the class or to other classes inheriting from this class.

  • Some use cases here:
    What is the benefit of private name mangling?

  • And here's a good YouTube talk that includes some use cases for
    this feature about 34 minutes in.

(N.B. The YouTube talk is from 2013, and the examples in the talk are written in python 2, so some of the syntax in the examples is a little different from modern python — print is still a statement rather than a function, etc.)

Here is an illustration of how private name mangling works when using class inheritance:

>>> class Foo:
... def __init__(self):
... self.__private_attribute = 'No one shall ever know'
... def baz_foo(self):
... print(self.__private_attribute)
...
>>> class Bar(Foo):
... def baz_bar(self):
... print(self.__private_attribute)
...
>>>
>>> b = Bar()
>>> b.baz_foo()
No one shall ever know
>>>
>>> b.baz_bar()
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<string>", line 3, in baz_bar
AttributeError: 'Bar' object has no attribute '_Bar__private_attribute'
>>>
>>> vars(b)
{'_Foo__private_attribute': 'No one shall ever know'}
>>>
>>> b._Foo__private_attribute
'No one shall ever know'

The methods defined in the base class Foo are able to access the private attribute using its private name that was defined in Foo. The methods defined in the subclass Bar, however, are only able to access the private attribute by using its mangled name; anything else leads to an exception.

collections.OrderedDict is a good example of a class in the standard library that makes extensive use of name-mangling to ensure that subclasses of OrderedDict do not accidentally override certain methods in OrderedDict that are important to the way OrderedDict works.

How do I fix this?

The obvious solution here is to rename your attribute so that it only has a single leading underscore, like so. This still sends a clear signal to external users that this is a private attribute that should not be directly modified by functions or classes outside of the class, but does not lead to any weird name mangling behaviour:

from abc import ABC, abstractmethod

class Search(ABC):
@abstractmethod
def search_products_by_name(self, name):
print('found', name)

class Catalog(Search):
def __init__(self):
self._product_names = {}

def search_products_by_name(self, name):
super().search_products_by_name(name)
return self._product_names.get(name)

x = Catalog()
x._product_names = {'x': 1, 'y':2}
print(x.search_products_by_name('x'))

Another solution is to roll with the name mangling, either like this:

from abc import ABC, abstractmethod

class Search(ABC):
@abstractmethod
def search_products_by_name(self, name):
print('found', name)

class Catalog(Search):
def __init__(self):
self.__product_names = {}

def search_products_by_name(self, name):
super().search_products_by_name(name)
return self.__product_names.get(name)

x = Catalog()
# we have to use the mangled name when accessing it from outside the class
x._Catalog__product_names = {'x': 1, 'y':2}
print(x.search_products_by_name('x'))

Or — and this is probably better, since it's just a bit weird to be accessing an attribute using its mangled name from outside the class — like this:

from abc import ABC, abstractmethod

class Search(ABC):
@abstractmethod
def search_products_by_name(self, name):
print('found', name)

class Catalog(Search):
def __init__(self):
self.__product_names = {}

def search_products_by_name(self, name):
super().search_products_by_name(name)
return self.__product_names.get(name)

def set_product_names(self, product_names):
# we can still use the private name from within the class
self.__product_names = product_names

x = Catalog()
x.set_product_names({'x': 1, 'y':2})
print(x.search_products_by_name('x'))

Double underscore in python

Leading double underscore names are private (meaning not available to derived classes)

This is not foolproof. It is implemented by mangling the name. Python Documentation says:

Any identifier of the form __spam (at least two leading underscores,
at most one trailing underscore) is textually replaced with
_classname__spam, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard
to the syntactic position of the identifier, so it can be used to
define class-private instance and class variables, methods, variables
stored in globals, and even variables stored in instances. private to
this class on instances of other classes.

Thus __get is actually mangled to _A__get in class A. When class B attempts to reference __get, it gets mangled to _B__get which doesn't match.

In other words __plugh defined in class Xyzzy means "unless you are running as class Xyzzy, thou shalt not touch the __plugh."

Underscore vs Double underscore with variables and methods

From PEP 8:

  • _single_leading_underscore: weak "internal use" indicator. E.g.

    from M import *

    does not import objects whose name starts with an underscore.

  • single_trailing_underscore_: used by convention to avoid conflicts with Python keyword, e.g.

    Tkinter.Toplevel(master, class_='ClassName')

  • __double_leading_underscore: when naming a class attribute, invokes name
    mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).

  • __double_leading_and_trailing_underscore__: "magic" objects or
    attributes that live in user-controlled namespaces. E.g. __init__,
    __import__ or __file__. Never invent such names; only use them
    as documented.

Also, from David Goodger's Code Like a Pythonista:

Attributes: interface, _internal, __private

But try to avoid the __private form. I never use it. Trust me. If you
use it, you WILL regret it later.

Explanation:

People coming from a C++/Java background are especially prone to
overusing/misusing this "feature". But __private names don't work the
same way as in Java or C++. They just trigger a name mangling whose
purpose is to prevent accidental namespace collisions in subclasses:
MyClass.__private just becomes MyClass._MyClass__private. (Note that
even this breaks down for subclasses with the same name as the
superclass, e.g. subclasses in different modules.) It is possible to
access __private names from outside their class, just inconvenient and
fragile (it adds a dependency on the exact name of the superclass).

The problem is that the author of a class may legitimately think "this
attribute/method name should be private, only accessible from within
this class definition" and use the __private convention. But later on,
a user of that class may make a subclass that legitimately needs
access to that name. So either the superclass has to be modified
(which may be difficult or impossible), or the subclass code has to
use manually mangled names (which is ugly and fragile at best).

There's a concept in Python: "we're all consenting adults here". If
you use the __private form, who are you protecting the attribute from?
It's the responsibility of subclasses to use attributes from
superclasses properly, and it's the responsibility of superclasses to
document their attributes properly.

It's better to use the single-leading-underscore convention,
_internal. "This isn't name mangled at all; it just indicates to
others to "be careful with this, it's an internal implementation
detail; don't touch it if you don't fully understand it". It's only a
convention though.

Is there any way to override the double-underscore (magic) methods of arbitrary objects in Python?

As millimoose says, an implicit __foo__ call never goes through __getattribute__. The only thing you can do is actually add the appropriate functions to your wrapper class.

class Wrapper(object):
def __init__(self, wrapped):
self.wrapped = wrapped

for dunder in ('__add__', '__sub__', '__len__', ...):
locals()[dunder] = lambda self, __f=dunder, *args, **kwargs: getattr(self.wrapped, __f)(*args, **kwargs)

obj = [1,2,3]
w = Wrapper(obj)
print len(w)

Class bodies are executed code like any other block (well, except def); you can put loops and whatever else you want inside. They're only magical in that the entire local scope is passed to type() at the end of the block to create the class.

This is, perhaps, the only case where assigning to locals() is even remotely useful.



Related Topics



Leave a reply



Submit