Why Is Name Mangling Not Standardized

Why is name mangling not standardized

The standard does not address implementation details. There are many,
many things which depend on the implementation, and which prevent
programs from working together: how the classes are laid out, the
structure of the vtable, etc. In general, compilers will change the
name mangling if they change any of these. This is intentional, as it
prevents code which would not work from linking.

It is possible for a given platform to define a C++ ABI; all compilers
which adhere to it would use compatible implementations, and have a
common name mangling. This is an issue for the platform vendors,
however; for whatever reasons, very few vendors have defined a C++ ABI.

And the reason extern "C" works is because almost all platforms define a C ABI.

Why can't C functions be name-mangled?

It was sort of answered above, but I'll try to put things into context.

First, C came first. As such, what C does is, sort of, the "default". It does not mangle names because it just doesn't. A function name is a function name. A global is a global, and so on.

Then C++ came along. C++ wanted to be able to use the same linker as C, and to be able to link with code written in C. But C++ could not leave the C "mangling" (or, lack there of) as is. Check out the following example:

int function(int a);
int function();

In C++, these are distinct functions, with distinct bodies. If none of them are mangled, both will be called "function" (or "_function"), and the linker will complain about the redefinition of a symbol. C++ solution was to mangle the argument types into the function name. So, one is called _function_int and the other is called _function_void (not actual mangling scheme) and the collision is avoided.

Now we're left with a problem. If int function(int a) was defined in a C module, and we're merely taking its header (i.e. declaration) in C++ code and using it, the compiler will generate an instruction to the linker to import _function_int. When the function was defined, in the C module, it was not called that. It was called _function. This will cause a linker error.

To avoid that error, during the declaration of the function, we tell the compiler it is a function designed to be linked with, or compiled by, a C compiler:

extern "C" int function(int a);

The C++ compiler now knows to import _function rather than _function_int, and all is well.

What is name mangling, and how does it work?

In the programming language of your choice, if an identifier is exported from a separately compiled unit, it needs a name by which it is known at link time. Name mangling solves the problem of overloaded identifiers in programming languages. (An identifier is "overloaded" if the same name is used in more than one context or with more than one meaning.)

Some examples:

In C++, function or method get may be overloaded at multiple types.
In Ada or Modula-3, function get may appear in multiple modules.

Multiple types and multiple modules cover the usual contexts.

Typical strategies:

Map each type to a string and use the combined high-level identifier and "type string" as the link-time name. Common in C++ (especially easy since overloading is permitted only for functions/methods and only on argument types) and Ada (where you can overload result types as well).
If an identifier is used in more than one module or namespace, join the name of the module with the name of the identifier, e.g., List_get instead of List.get.

Depending on what characters are legal in link-time names, you may have to do additional mangling; for example, it may be necessary to use the underscore as an 'escape' character, so you can distinguish

List_my.get -> List__my_get

from

List.my_get -> List_my__get

(Admittedly this example is reaching, but as a compiler writer, I have to guarantee that distinct identifiers in the source code map to distinct link-time names. That's the whole reason and purpose for name mangling.)

Conventions for name mangling

Do all compilers follow a single standard scheme?

No. Which answers your second question too.

Name mangling is a function of an ABI. And there are standard ABIs, which multiple compilers generate code for. The Itanium ABI, for example, has its own name mangling scheme that several compilers implement.

But the C++ standard gives no requirements for name mangling. Indeed, it doesn't even say that such name mangling is required.

Should I use name mangling in Python?

When in doubt, leave it "public" - I mean, do not add anything to obscure the name of your attribute. If you have a class with some internal value, do not bother about it. Instead of writing:

class Stack(object):

    def __init__(self):
        self.__storage = [] # Too uptight

    def push(self, value):
        self.__storage.append(value)

write this by default:

class Stack(object):

    def __init__(self):
        self.storage = [] # No mangling

    def push(self, value):
        self.storage.append(value)

This is for sure a controversial way of doing things. Python newbies hate it, and even some old Python guys despise this default - but it is the default anyway, so I recommend you to follow it, even if you feel uncomfortable.

If you really want to send the message "Can't touch this!" to your users, the usual way is to precede the variable with one underscore. This is just a convention, but people understand it and take double care when dealing with such stuff:

class Stack(object):

    def __init__(self):
        self._storage = [] # This is ok, but Pythonistas use it to be relaxed about it

    def push(self, value):
        self._storage.append(value)

This can be useful, too, for avoiding conflict between property names and attribute names:

 class Person(object):
     def __init__(self, name, age):
         self.name = name
         self._age = age if age >= 0 else 0
     
     @property
     def age(self):
         return self._age
     
     @age.setter
     def age(self, age):
         if age >= 0:
             self._age = age
         else:
             self._age  = 0

What about the double underscore? Well, we use the double underscore magic mainly to avoid accidental overloading of methods and name conflicts with superclasses' attributes. It can be pretty valuable if you write a class to be extended many times.

If you want to use it for other purposes, you can, but it is neither usual nor recommended.

EDIT: Why is this so? Well, the usual Python style does not emphasize making things private - on the contrary! There are many reasons for that - most of them controversial... Let us see some of them.

Python has properties

Today, most OO languages use the opposite approach: what should not be used should not be visible, so attributes should be private. Theoretically, this would yield more manageable, less coupled classes because no one would change the objects' values recklessly.

However, it is not so simple. For example, Java classes have many getters that only get the values and setters that only set the values. You need, let us say, seven lines of code to declare a single attribute - which a Python programmer would say is needlessly complex. Also, you write a lot of code to get one public field since you can change its value using the getters and setters in practice.

So why follow this private-by-default policy? Just make your attributes public by default. Of course, this is problematic in Java because if you decide to add some validation to your attribute, it would require you to change all:

person.age = age;

in your code to, let us say,

person.setAge(age);

setAge() being:

public void setAge(int age) {
    if (age >= 0) {
        this.age = age;
    } else {
        this.age = 0;
    }
}

So in Java (and other languages), the default is to use getters and setters anyway because they can be annoying to write but can spare you much time if you find yourself in the situation I've described.

However, you do not need to do it in Python since Python has properties. If you have this class:

 class Person(object):
     def __init__(self, name, age):
         self.name = name
         self.age = age

...and then you decide to validate ages, you do not need to change the person.age = age pieces of your code. Just add a property (as shown below)

 class Person(object):
     def __init__(self, name, age):
         self.name = name
         self._age = age if age >= 0 else 0
     
     @property
     def age(self):
         return self._age
     
     @age.setter
     def age(self, age):
         if age >= 0:
             self._age = age
         else:
             self._age  = 0

Suppose you can do it and still use person.age = age, why would you add private fields and getters and setters?

(Also, see Python is not Java and this article about the harms of using getters and setters.).

Everything is visible anyway - and trying to hide complicates your work

Even in languages with private attributes, you can access them through some reflection/introspection library. And people do it a lot, in frameworks and for solving urgent needs. The problem is that introspection libraries are just a complicated way of doing what you could do with public attributes.

Since Python is a very dynamic language, adding this burden to your classes is counterproductive.

The problem is not being possible to see - it is being required to see

For a Pythonista, encapsulation is not the inability to see the internals of classes but the possibility of avoiding looking at it. Encapsulation is the property of a component that the user can use without concerning about the internal details. If you can use a component without bothering yourself about its implementation, then it is encapsulated (in the opinion of a Python programmer).

Now, if you wrote a class you can use it without thinking about implementation details, there is no problem if you want to look inside the class for some reason. The point is: your API should be good, and the rest is details.

Guido said so

Well, this is not controversial: he said so, actually. (Look for "open kimono.")

This is culture

Yes, there are some reasons, but no critical reason. This is primarily a cultural aspect of programming in Python. Frankly, it could be the other way, too - but it is not. Also, you could just as easily ask the other way around: why do some languages use private attributes by default? For the same main reason as for the Python practice: because it is the culture of these languages, and each choice has advantages and disadvantages.

Since there already is this culture, you are well-advised to follow it. Otherwise, you will get annoyed by Python programmers telling you to remove the __ from your code when you ask a question in Stack Overflow :)

How can i avoid name mangling?

You can't. It's built into compilers to allow you overloading functions and to have functions with the same name in different classes and such stuff. But you can write functions that are mangled like C functions. Those can be called from C code. But those can't be overloaded and can't be called by "normal" C++ function pointers:

extern "C" void foo() {

}

The above function will be mangled like C functions for your compiler. That may include no change at all to the name, or some changes like a leading "_" in front of it or so.

How static library in c++ work with name mangle?

While C++ offers Polymorphism (i.e. different things can be named equal in the same scope by using other features for distinguishing them), this is not supported by linkers (neither in statical nor dynamical linking). So, C++ compilers uses name mangling. (The other features are used to decorate the original identifier to produce a unique name.)

A C++ compiler compiles each C++ file (aka. translation unit) on its own. Hence, it is obvious that the name mangling has to be done in a unique, reproducible way by this compiler. I.e. the same declaration has to be mapped to the same symbol always.
Otherwise, it would be impossible for the linker to resolve symbols that were declared (only) in one file and defined in another.

However, there isn't a general name-mangling standard (e.g. as part of the C++ standard).

So, even on the same platform, the binary codes produced by two different compilers may be incompatible due to different name-mangling (as well as other details).
(For MS Visual C++, this makes even binaries from distinct versions incompatible.)

To overcome this, there exist Application Binary Interfaces (ABIs) for certain platforms (e.g. Linux).
One detail of an ABI is a standardized name-mangling.

Is C++ name mangling (decoration) deterministic?

It isn't specified by the standard, and has certainly changed between versions of the same compiler in my experience, though it has to be deterministic over some fixed set of circumstances, because otherwise there would be no way to link two separately compiled modules.

If you're using GetProcAddress, it would be far cleaner to export the functions as extern "C" so their names are not mangled.

Why Is Name Mangling Not Standardized