Python Name Mangling

Python name mangling

When in doubt, leave it "public" - I mean, do not add anything to obscure the name of your attribute. If you have a class with some internal value, do not bother about it. Instead of writing:

class Stack(object):

    def __init__(self):
        self.__storage = [] # Too uptight

    def push(self, value):
        self.__storage.append(value)

write this by default:

class Stack(object):

    def __init__(self):
        self.storage = [] # No mangling

    def push(self, value):
        self.storage.append(value)

This is for sure a controversial way of doing things. Python newbies hate it, and even some old Python guys despise this default - but it is the default anyway, so I recommend you to follow it, even if you feel uncomfortable.

If you really want to send the message "Can't touch this!" to your users, the usual way is to precede the variable with one underscore. This is just a convention, but people understand it and take double care when dealing with such stuff:

class Stack(object):

    def __init__(self):
        self._storage = [] # This is ok, but Pythonistas use it to be relaxed about it

    def push(self, value):
        self._storage.append(value)

This can be useful, too, for avoiding conflict between property names and attribute names:

 class Person(object):
     def __init__(self, name, age):
         self.name = name
         self._age = age if age >= 0 else 0
     
     @property
     def age(self):
         return self._age
     
     @age.setter
     def age(self, age):
         if age >= 0:
             self._age = age
         else:
             self._age  = 0

What about the double underscore? Well, we use the double underscore magic mainly to avoid accidental overloading of methods and name conflicts with superclasses' attributes. It can be pretty valuable if you write a class to be extended many times.

If you want to use it for other purposes, you can, but it is neither usual nor recommended.

EDIT: Why is this so? Well, the usual Python style does not emphasize making things private - on the contrary! There are many reasons for that - most of them controversial... Let us see some of them.

Python has properties

Today, most OO languages use the opposite approach: what should not be used should not be visible, so attributes should be private. Theoretically, this would yield more manageable, less coupled classes because no one would change the objects' values recklessly.

However, it is not so simple. For example, Java classes have many getters that only get the values and setters that only set the values. You need, let us say, seven lines of code to declare a single attribute - which a Python programmer would say is needlessly complex. Also, you write a lot of code to get one public field since you can change its value using the getters and setters in practice.

So why follow this private-by-default policy? Just make your attributes public by default. Of course, this is problematic in Java because if you decide to add some validation to your attribute, it would require you to change all:

person.age = age;

in your code to, let us say,

person.setAge(age);

setAge() being:

public void setAge(int age) {
    if (age >= 0) {
        this.age = age;
    } else {
        this.age = 0;
    }
}

So in Java (and other languages), the default is to use getters and setters anyway because they can be annoying to write but can spare you much time if you find yourself in the situation I've described.

However, you do not need to do it in Python since Python has properties. If you have this class:

 class Person(object):
     def __init__(self, name, age):
         self.name = name
         self.age = age

...and then you decide to validate ages, you do not need to change the person.age = age pieces of your code. Just add a property (as shown below)

 class Person(object):
     def __init__(self, name, age):
         self.name = name
         self._age = age if age >= 0 else 0
     
     @property
     def age(self):
         return self._age
     
     @age.setter
     def age(self, age):
         if age >= 0:
             self._age = age
         else:
             self._age  = 0

Suppose you can do it and still use person.age = age, why would you add private fields and getters and setters?

(Also, see Python is not Java and this article about the harms of using getters and setters.).

Everything is visible anyway - and trying to hide complicates your work

Even in languages with private attributes, you can access them through some reflection/introspection library. And people do it a lot, in frameworks and for solving urgent needs. The problem is that introspection libraries are just a complicated way of doing what you could do with public attributes.

Since Python is a very dynamic language, adding this burden to your classes is counterproductive.

The problem is not being possible to see - it is being required to see

For a Pythonista, encapsulation is not the inability to see the internals of classes but the possibility of avoiding looking at it. Encapsulation is the property of a component that the user can use without concerning about the internal details. If you can use a component without bothering yourself about its implementation, then it is encapsulated (in the opinion of a Python programmer).

Now, if you wrote a class you can use it without thinking about implementation details, there is no problem if you want to look inside the class for some reason. The point is: your API should be good, and the rest is details.

Guido said so

Well, this is not controversial: he said so, actually. (Look for "open kimono.")

This is culture

Yes, there are some reasons, but no critical reason. This is primarily a cultural aspect of programming in Python. Frankly, it could be the other way, too - but it is not. Also, you could just as easily ask the other way around: why do some languages use private attributes by default? For the same main reason as for the Python practice: because it is the culture of these languages, and each choice has advantages and disadvantages.

Since there already is this culture, you are well-advised to follow it. Otherwise, you will get annoyed by Python programmers telling you to remove the __ from your code when you ask a question in Stack Overflow :)

Can someone illustrate the procedure of python name mangling?

You've made me curious, so you've made me look this up. First, when you look at the resulting byte code, it already uses STORE_NAME with the "mangled" name.

Who does it and when then? https://github.com/python/cpython/blob/master/Python/compile.c Holds the answer:

This file compiles an abstract syntax tree (AST) into Python bytecode.

And the corresponding function would be _Py_Mangle ("Name mangling: __private becomes _classname__private.")

python name mangling and global variables

Name mangling happens anywhere in a class body according to the docs:

When an identifier that textually occurs in a class definition begins with two or more underscore characters and does not end in two or more underscores, it is considered a private name of that class ... This transformation is independent of the syntactical context in which the identifier is used.

The tutorial example shows that the mangling happens to attributes within the class too.

But the real problem is that your code is confusing namespaces. The constructor you showed originally

def __init__(self, x):
    __x = x

creates a local variable _test__x, which gets discarded as soon as it completes.

To properly assign an instance attribute:

def __init__(self, x):
    self.__x = x

This will create an attribute whose name is actually _test__x.

If you actually want to assign the global:

def __init__(self, x):
    global __x
    __x = x

def __init__(self, x):
    global _test__x
    __x = x

The getter needs to access the instance attribute just as the constructor needs to set it. The current version is accessing the global because the name _tesy__x does not exist in the local namespace:

def rex(self):
    return __x

To return an attribute, add the namespace:

def rex(self):
    return self.__x

All that being said, if you had a module-level attribute with a leading double underscore (which you just shouldn't do), you would have a very hard time accessing it in the class. You would have to go through globals, something like this:

globals()['__x']

Shameless plug of my question about that from a long time ago: How to access private variable of Python module from class.

Another fun fact: name mangling won't happen at all if your class name is all underscores. You can escape all your problems by naming classes _, __, ___, ____, etc.

Attribute name mangled with parent class name (at first parsing) instead of child class name (at instantiation/call)

Answered with help from the comments:

Python's name-mangling occurs when the method is 'read' by the interpreter (probably not the correct terminology), not when it is called. The tests evidence the process that happens in the background.

When __var is first encountered, it is inside the body of test_parent and is name-mangled:

class test_parent(object):
    __var = 1

becomes:

class test_parent(object):
    _test_parent__var = 1

A similar thing happens when __var is encountered in test_child, becoming _test_child__var. The previously unexpected behaviour came about because the same thing happened inside getvar.

class test_parent(object):
    ...
    def getvar(self):
        ...
        a = self.__class__.__var

becomes:

class test_parent(object):
    ...
    def getvar(self):
        ...
        a = self.__class__._test__parent_var

That is why the test code b.getvar() returns 3 once b.__class__._test_parent__var is assigned to it, because that is the value accessed by the getvar method.

Is this summary of the purpose of name mangling correct?

Name mangling is unnecessary for a child class to override a method with more parameters. The purpose is to prevent overriding of a method, particularly the overriding of a "private" method that serves some internal purpose for the class and whose behaviour should not be changed by subclasses.

The idea is that if a subclass declares a method of the "same" name, it will be mangled to a different one to avoid overriding the superclass's method. Compare and contrast with Java, where a private method cannot be overridden or even called from within the subclass.

class A:
    def foo(self):
        self.bar()
        self.__baz()
    def bar(self):
        print('A.bar')
    def __baz(self):
        print('A.__baz')

class B(A):
    def bar(self):
        print('B.bar')
    def __baz(self):
        print('B.__baz')

B().foo()

Output:

B.bar
A.__baz

Note that this is about overriding, not overloading. Python does not have method overloading, the number or types of the arguments in a method call are not used to determine which method is invoked; only the method name is used for that.

Python name mangling allows access both ways

Whenever you write __c inside a class, it will be textually replaced by _<classname>__c. It's not dynamically performed, it's done at the parsing stage. Hence, the interpreter won't ever see __c, only _<classname>__c. That's why only _C__c appears in dir(instance).

Quoting the docs:

[...] Private names are transformed to a longer form before code is generated for them. The transformation inserts the class name, with leading underscores removed and a single underscore inserted, in front of the name. For example, the identifier __spam occurring in a class named Ham will be transformed to _Ham__spam. This transformation is independent of the syntactical context in which the identifier is used. [...]

For that reason, it only applies to dotted attribute access (x.y), not to dynamic access via (get|set)attr:

>>> class Foo:
...     def __init__(self):
...         setattr(self, '__x', 'test')
... 
>>> Foo().__x
'test'

Python double underscore mangling

Name mangling occurs during the evaluation of a class statement. In the case of Bar, the __cache attribute is not defined as part of the class, but rather added to a specific object after the fact.

(Actually, that may not be entirely correct. Name mangling may occur during the evaluation of the __new__ method; I do not know. But regardless, your __cache is added explicitly to a single object, not added by the class code.)

no need of name mangling for another object of the same class in python?

Name mangling works at compile time and replaces the name of any attribute starting with two underscores with the mangled version.

Consider the following:

>>> class Foo:
...     def __init__(self):
...         bar.__x += 1
... 
>>> import dis
>>> dis.dis(Foo.__init__)
  3           0 LOAD_GLOBAL              0 (bar)
              3 DUP_TOP             
              4 LOAD_ATTR                1 (_Foo__x)
              7 LOAD_CONST               1 (1)
             10 INPLACE_ADD         
             11 ROT_TWO             
             12 STORE_ATTR               1 (_Foo__x)
             15 LOAD_CONST               0 (None)
             18 RETURN_VALUE        
>>>

As you can notice the code for bar.__x has been compiled as if it was bar._Foo__x. This happens no matter what the type of bar will be when calling that code. In this case name mangling happens even when accessing members of a global.

Note that if you find yourself using name mangling a lot then probably you're doing something wrong. A Python programmer can live happily for quite a while without ever needing private data members.

This happens also because in Python derivation is not needed as often as in other languages thanks to delegation and duck typing...