Why Is the Value of _Name_ Changing After Assignment to Sys.Modules[_Name_]

Why is the value of name changing after assignment to sys.modules[name]?

This happens because you have overwrite your module when you did sys.modules[__name__] = _test() so your module was deleted (because the module didn't have any references to it anymore and the reference counter went to zero so it's deleted) but in the mean time the interpreter still have the byte code so it will still work but by returning None to every variable in your module (this is because python sets all the variables to None in a module when it's deleted).

class _test(object): pass

import sys
print sys.modules['__main__']
# <module '__main__' from 'test.py'>  <<< the test.py is the name of this module
sys.modules[__name__] = _test()
# Which is the same as doing sys.modules['__main__'] = _test() but wait a
# minute isn't sys.modules['__main__'] was referencing to this module so
# Oops i just overwrite this module entry so this module will be deleted
# it's like if i did:
#
#   import test
#   __main__ = test
#   del test
#   __main__ = _test()
#   test will be deleted because the only reference for it was __main__ in
#   that point.

print sys, __name__
# None, None

import sys   # i should re import sys again.
print sys.modules['__main__']
# <__main__._test instance at 0x7f031fcb5488>  <<< my new module reference.

EDIT:

A fix will be by doing like this:

class _test(object): pass

import sys
ref = sys.modules[__name__]  # Create another reference of this module.
sys.modules[__name__] = _test()   # Now when it's overwritten it will not be
                                  # deleted because a reference to it still
                                  # exists.

print __name__, _test
# __main__ <class '__main__._test'>

Hope this will explain things.

What does if name == main: do?

Short Answer

It's boilerplate code that protects users from accidentally invoking the script when they didn't intend to. Here are some common problems when the guard is omitted from a script:

If you import the guardless script in another script (e.g. import my_script_without_a_name_eq_main_guard), then the latter script will trigger the former to run at import time and using the second script's command line arguments. This is almost always a mistake.
If you have a custom class in the guardless script and save it to a pickle file, then unpickling it in another script will trigger an import of the guardless script, with the same problems outlined in the previous bullet.

Long Answer

To better understand why and how this matters, we need to take a step back to understand how Python initializes scripts and how this interacts with its module import mechanism.

Whenever the Python interpreter reads a source file, it does two things:

it sets a few special variables like __name__, and then
it executes all of the code found in the file.

Let's see how this works and how it relates to your question about the __name__ checks we always see in Python scripts.

Code Sample

Let's use a slightly different code sample to explore how imports and scripts work. Suppose the following is in a file called foo.py.

# Suppose this is foo.py.

print("before import")
import math

print("before function_a")
def function_a():
    print("Function A")

print("before function_b")
def function_b():
    print("Function B {}".format(math.sqrt(100)))

print("before __name__ guard")
if __name__ == '__main__':
    function_a()
    function_b()
print("after __name__ guard")

Special Variables

When the Python interpreter reads a source file, it first defines a few special variables. In this case, we care about the __name__ variable.

When Your Module Is the Main Program

If you are running your module (the source file) as the main program, e.g.

python foo.py

the interpreter will assign the hard-coded string "__main__" to the __name__ variable, i.e.

# It's as if the interpreter inserts this at the top
# of your module when run as the main program.
__name__ = "__main__"

When Your Module Is Imported By Another

On the other hand, suppose some other module is the main program and it imports your module. This means there's a statement like this in the main program, or in some other module the main program imports:

# Suppose this is in some other main program.
import foo

The interpreter will search for your foo.py file (along with searching for a few other variants), and prior to executing that module, it will assign the name "foo" from the import statement to the __name__ variable, i.e.

# It's as if the interpreter inserts this at the top
# of your module when it's imported from another module.
__name__ = "foo"

Executing the Module's Code

After the special variables are set up, the interpreter executes all the code in the module, one statement at a time. You may want to open another window on the side with the code sample so you can follow along with this explanation.

Always

It prints the string "before import" (without quotes).
It loads the math module and assigns it to a variable called math. This is equivalent to replacing import math with the following (note that __import__ is a low-level function in Python that takes a string and triggers the actual import):

# Find and load a module given its string name, "math",
# then assign it to a local variable called math.
math = __import__("math")

It prints the string "before function_a".
It executes the def block, creating a function object, then assigning that function object to a variable called function_a.
It prints the string "before function_b".
It executes the second def block, creating another function object, then assigning it to a variable called function_b.
It prints the string "before __name__ guard".

Only When Your Module Is the Main Program

If your module is the main program, then it will see that __name__ was indeed set to "__main__" and it calls the two functions, printing the strings "Function A" and "Function B 10.0".

Only When Your Module Is Imported by Another

(instead) If your module is not the main program but was imported by another one, then __name__ will be "foo", not "__main__", and it'll skip the body of the if statement.

Always

It will print the string "after __name__ guard" in both situations.

Summary

In summary, here's what'd be printed in the two cases:

# What gets printed if foo is the main program
before import
before function_a
before function_b
before __name__ guard
Function A
Function B 10.0
after __name__ guard

# What gets printed if foo is imported as a regular module
before import
before function_a
before function_b
before __name__ guard
after __name__ guard

Why Does It Work This Way?

You might naturally wonder why anybody would want this. Well, sometimes you want to write a .py file that can be both used by other programs and/or modules as a module, and can also be run as the main program itself. Examples:

Your module is a library, but you want to have a script mode where it runs some unit tests or a demo.
Your module is only used as a main program, but it has some unit tests, and the testing framework works by importing .py files like your script and running special test functions. You don't want it to try running the script just because it's importing the module.
Your module is mostly used as a main program, but it also provides a programmer-friendly API for advanced users.

Beyond those examples, it's elegant that running a script in Python is just setting up a few magic variables and importing the script. "Running" the script is a side effect of importing the script's module.

Food for Thought

Question: Can I have multiple __name__ checking blocks? Answer: it's strange to do so, but the language won't stop you.
Suppose the following is in foo2.py. What happens if you say python foo2.py on the command-line? Why?

# Suppose this is foo2.py.
import os, sys; sys.path.insert(0, os.path.dirname(__file__)) # needed for some interpreters

def function_a():
    print("a1")
    from foo2 import function_b
    print("a2")
    function_b()
    print("a3")

def function_b():
    print("b")

print("t1")
if __name__ == "__main__":
    print("m1")
    function_a()
    print("m2")
print("t2")

Now, figure out what will happen if you remove the __name__ check in foo3.py:

# Suppose this is foo3.py.
import os, sys; sys.path.insert(0, os.path.dirname(__file__)) # needed for some interpreters

def function_a():
    print("a1")
    from foo3 import function_b
    print("a2")
    function_b()
    print("a3")

def function_b():
    print("b")

print("t1")
print("m1")
function_a()
print("m2")
print("t2")

What will this do when used as a script? When imported as a module?

# Suppose this is in foo4.py
__name__ = "__main__"

def bar():
    print("bar")
    
print("before __name__ guard")
if __name__ == "__main__":
    bar()
print("after __name__ guard")

How is the name variable in a Python module defined?

It is set to the absolute name of the module as imported. If you imported it as foo.bar, then __name__ is set to 'foo.bar'.

The name is determined in the import.c module, but because that module handles various different types of imports (including zip imports, bytecode-only imports and extension modules) there are several code paths to trace through.

Normally, import statements are translated to a call to __import__, which is by default implemented as a call to PyImport_ImportModuleLevelObject. See the __import__() documentation to get a feel for what the arguments mean. Within PyImport_ImportModuleLevelObject relative names are resolved, so you can chase down the name variables there if you want to.

The rest of the module handles the actual imports, with PyImport_AddModuleObject creating the actual namespace object and setting the name key, but you can trace that name value back to PyImport_ImportModuleLevelObject. By creating a module object, it's __name__ value is set in the moduleobject.c object constructor.

getattr on a module

A while ago, Guido declared that all special method lookups on
new-style classes bypass __getattr__ and __getattribute__. Dunder methods had previously worked on modules - you could, for example, use a module as a context manager simply by defining __enter__ and __exit__, before those tricks broke.

Recently some historical features have made a comeback, the module __getattr__ among them, and so the existing hack (a module replacing itself with a class in sys.modules at import time) should be no longer necessary.

In Python 3.7+, you just use the one obvious way. To customize attribute access on a module, define a __getattr__ function at the module level which should accept one argument (name of attribute), and return the computed value or raise an AttributeError:

# my_module.py

def __getattr__(name: str) -> Any:
    ...

This will also allow hooks into "from" imports, i.e. you can return dynamically generated objects for statements such as from my_module import whatever.

On a related note, along with the module getattr you may also define a __dir__ function at module level to respond to dir(my_module). See PEP 562 for details.

Difference between global and import main

This is related to how Python translate your code to bytecode (the compilation step).

When compiling a function, Python treat all variable that are assigned as local variable and perform an optimisation to reduce the number of name lookup it would have to do. Each local variable get assigned an index, and when the function is called their value will be stored in a stack local array addressed by index. The compiler will emit LOAD_FAST and STORE_FAST opcode to access the variable.

The global syntax indicate instead to the compiler that even if the variable is assigned a value, it should not be considered a local variable, should not be assigned an index. It will instead use LOAD_GLOBAL and STORE_GLOBAL opcode to access the variable. Those opcode are slower since they use the name to do a lookup in possibly many dictionaries (locals, globals).

If a variable is only accessed for reading the value, the compiler always emit LOAD_GLOBAL since it don't know whether it is supposed to be a local or global variable, and thus assume it is a global.

So, in your first function, using global x informs the compiler that you want it to treat the write access to x as writing to a global variable instead of a local variable. The opcodes for the function make it clear:

>>> dis.dis(changeXto1)
  3           0 LOAD_CONST               1 (1)
              3 STORE_GLOBAL             0 (x)
              6 LOAD_CONST               0 (None)
              9 RETURN_VALUE

In your third example, you import the __main__ module into a local variable named __main__ and then assign to its x field. Since module are object that store all top-level mapping as fields, you are assigning to the variable x in the __main__ module. And as you found, the __main__ module fields directly map to the values in the globals() dictionary because your code is defined in the __main__ module. The opcodes show that you don't access x directly:

>>> dis.dis(changeXto3)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               0 (None)
              6 IMPORT_NAME              0 (__main__)
              9 STORE_FAST               0 (__main__)

  3          12 LOAD_CONST               2 (3)
             15 LOAD_FAST                0 (__main__)
             18 STORE_ATTR               1 (x)
             21 LOAD_CONST               0 (None)
             24 RETURN_VALUE

The second example is interesting. Since you assign a value to the x variable, the compiler assume it is a local variable and does the optimisation. Then, the from __main__ import x does import the module __main__ and create a new binding the value of x in the module __main__ to the local variable named x. This is always the case, from ${module} import ${name} just create a new binding the current namespace. When you assign a new value to the variable x you just change the current binding, not the binding in module __main__ that is unrelated (though if the value is mutable, and you mutate it, the change will be visible through all the bindings). Here are the opcodes:

>>> dis.dis(f2)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               2 (('x',))
              6 IMPORT_NAME              0 (__main__)
              9 IMPORT_FROM              1 (x)
             12 STORE_FAST               0 (x)
             15 POP_TOP             

  3          16 LOAD_CONST               3 (2)
             19 STORE_FAST               0 (x)
             22 LOAD_CONST               0 (None)
             25 RETURN_VALUE

A good way to think about this is that in Python all assignment are binding a name to a value in a dictionary, and dereference is just doing a dictionary lookup (this is a rough approximation, but pretty close to the conceptual model). When doing obj.field, then you are looking up the hidden dictionary of obj (accessible via obj.__dict__) for the "field" key.

When you have a naked variable name, then it is looked up in the locals() dictionary, then the globals() dictionary if it is different (they are the same when the code is executed at a module level). For an assignment, it always put the binding in the locals() dictionary, unless you declared that you wanted a global access by doing global ${name} (this syntax also works at top-level).

So translating your function, this is almost if you had written:

# NOTE: this is valid Python code, but is less optimal than
# the original code. It is here only for demonstration.

def changeXto1():
    globals()['x'] = 1

def changeXto2():
    locals()['x'] = __import__('__main__').__dict__['x']
    locals()['x'] = 2

def changeXto3():
    locals()['__main__'] = __import__('__main__')
    locals()['__main__'].__dict__['x'] = 3

Why Is the Value of _Name_ Changing After Assignment to Sys.Modules[_Name_]