Compulsory usage of if __name__==__main__ in windows while using multiprocessing
Expanding a bit on the good answer you already got, it helps if you understand what Linux-y systems do. They spawn new processes using fork()
, which has two good consequences:
- All data structures existing in the main program are visible to the child processes. They actually work on copies of the data.
- The child processes start executing at the instruction immediately following the
fork()
in the main program - so any module-level code already executed in the module will not be executed again.
fork()
isn't possible in Windows, so on Windows each module is imported anew by each child process. So:
- On Windows, no data structures existing in the main program are visible to the child processes; and,
- All module-level code is executed in each child process.
So you need to think a bit about which code you want executed only in the main program. The most obvious example is that you want code that creates child processes to run only in the main program - so that should be protected by __name__ == '__main__'
. For a subtler example, consider code that builds a gigantic list, which you intend to pass out to worker processes to crawl over. You probably want to protect that too, because there's no point in this case to make each worker process waste RAM and time building their own useless copies of the gigantic list.
Note that it's a Good Idea to use __name__ == "__main__"
appropriately even on Linux-y systems, because it makes the intended division of work clearer. Parallel programs can be confusing - every little bit helps ;-)
python multiprocessing on windows, if __name__ == __main__
You do not have to call Process()
from the "top level" of the module.
It is perfectly fine to call Process
from a class method.
The only caveat is that you can not allow Process()
to be called if or when the module is imported.
Since Windows has no fork
, the multiprocessing module starts a new Python process and imports the calling module. If Process()
gets called upon import, then this sets off an infinite succession of new processes (or until your machine runs out of resources). This is the reason for hiding calls to Process()
inside
if __name__ == "__main__"
since statements inside this if-statement
will not get called upon import.
What does if __name__ == __main__: do?
Short Answer
It's boilerplate code that protects users from accidentally invoking the script when they didn't intend to. Here are some common problems when the guard is omitted from a script:
If you import the guardless script in another script (e.g.
import my_script_without_a_name_eq_main_guard
), then the latter script will trigger the former to run at import time and using the second script's command line arguments. This is almost always a mistake.If you have a custom class in the guardless script and save it to a pickle file, then unpickling it in another script will trigger an import of the guardless script, with the same problems outlined in the previous bullet.
Long Answer
To better understand why and how this matters, we need to take a step back to understand how Python initializes scripts and how this interacts with its module import mechanism.
Whenever the Python interpreter reads a source file, it does two things:
it sets a few special variables like
__name__
, and thenit executes all of the code found in the file.
Let's see how this works and how it relates to your question about the __name__
checks we always see in Python scripts.
Code Sample
Let's use a slightly different code sample to explore how imports and scripts work. Suppose the following is in a file called foo.py
.
# Suppose this is foo.py.
print("before import")
import math
print("before function_a")
def function_a():
print("Function A")
print("before function_b")
def function_b():
print("Function B {}".format(math.sqrt(100)))
print("before __name__ guard")
if __name__ == '__main__':
function_a()
function_b()
print("after __name__ guard")
Special Variables
When the Python interpreter reads a source file, it first defines a few special variables. In this case, we care about the __name__
variable.
When Your Module Is the Main Program
If you are running your module (the source file) as the main program, e.g.
python foo.py
the interpreter will assign the hard-coded string "__main__"
to the __name__
variable, i.e.
# It's as if the interpreter inserts this at the top
# of your module when run as the main program.
__name__ = "__main__"
When Your Module Is Imported By Another
On the other hand, suppose some other module is the main program and it imports your module. This means there's a statement like this in the main program, or in some other module the main program imports:
# Suppose this is in some other main program.
import foo
The interpreter will search for your foo.py
file (along with searching for a few other variants), and prior to executing that module, it will assign the name "foo"
from the import statement to the __name__
variable, i.e.
# It's as if the interpreter inserts this at the top
# of your module when it's imported from another module.
__name__ = "foo"
Executing the Module's Code
After the special variables are set up, the interpreter executes all the code in the module, one statement at a time. You may want to open another window on the side with the code sample so you can follow along with this explanation.
Always
It prints the string
"before import"
(without quotes).It loads the
math
module and assigns it to a variable calledmath
. This is equivalent to replacingimport math
with the following (note that__import__
is a low-level function in Python that takes a string and triggers the actual import):
# Find and load a module given its string name, "math",
# then assign it to a local variable called math.
math = __import__("math")
It prints the string
"before function_a"
.It executes the
def
block, creating a function object, then assigning that function object to a variable calledfunction_a
.It prints the string
"before function_b"
.It executes the second
def
block, creating another function object, then assigning it to a variable calledfunction_b
.It prints the string
"before __name__ guard"
.
Only When Your Module Is the Main Program
- If your module is the main program, then it will see that
__name__
was indeed set to"__main__"
and it calls the two functions, printing the strings"Function A"
and"Function B 10.0"
.
Only When Your Module Is Imported by Another
- (instead) If your module is not the main program but was imported by another one, then
__name__
will be"foo"
, not"__main__"
, and it'll skip the body of theif
statement.
Always
- It will print the string
"after __name__ guard"
in both situations.
Summary
In summary, here's what'd be printed in the two cases:
# What gets printed if foo is the main program
before import
before function_a
before function_b
before __name__ guard
Function A
Function B 10.0
after __name__ guard
# What gets printed if foo is imported as a regular module
before import
before function_a
before function_b
before __name__ guard
after __name__ guard
Why Does It Work This Way?
You might naturally wonder why anybody would want this. Well, sometimes you want to write a .py
file that can be both used by other programs and/or modules as a module, and can also be run as the main program itself. Examples:
Your module is a library, but you want to have a script mode where it runs some unit tests or a demo.
Your module is only used as a main program, but it has some unit tests, and the testing framework works by importing
.py
files like your script and running special test functions. You don't want it to try running the script just because it's importing the module.Your module is mostly used as a main program, but it also provides a programmer-friendly API for advanced users.
Beyond those examples, it's elegant that running a script in Python is just setting up a few magic variables and importing the script. "Running" the script is a side effect of importing the script's module.
Food for Thought
Question: Can I have multiple
__name__
checking blocks? Answer: it's strange to do so, but the language won't stop you.Suppose the following is in
foo2.py
. What happens if you saypython foo2.py
on the command-line? Why?
# Suppose this is foo2.py.
import os, sys; sys.path.insert(0, os.path.dirname(__file__)) # needed for some interpreters
def function_a():
print("a1")
from foo2 import function_b
print("a2")
function_b()
print("a3")
def function_b():
print("b")
print("t1")
if __name__ == "__main__":
print("m1")
function_a()
print("m2")
print("t2")
- Now, figure out what will happen if you remove the
__name__
check infoo3.py
:
# Suppose this is foo3.py.
import os, sys; sys.path.insert(0, os.path.dirname(__file__)) # needed for some interpreters
def function_a():
print("a1")
from foo3 import function_b
print("a2")
function_b()
print("a3")
def function_b():
print("b")
print("t1")
print("m1")
function_a()
print("m2")
print("t2")
- What will this do when used as a script? When imported as a module?
# Suppose this is in foo4.py
__name__ = "__main__"
def bar():
print("bar")
print("before __name__ guard")
if __name__ == "__main__":
bar()
print("after __name__ guard")
Workaround for using __name__=='__main__' in Python multiprocessing
The main module is imported (but with __name__ != '__main__'
because Windows is trying to simulate a forking-like behavior on a system that doesn't have forking). multiprocessing
has no way to know that you didn't do anything important in you main module, so the import is done "just in case" to create an environment similar to the one in your main process. If it didn't do this, all sorts of stuff that happens by side-effect in main (e.g. imports, configuration calls with persistent side-effects, etc.) might not be properly performed in the child processes.
As such, if they're not protecting their __main__
, the code is not multiprocessing safe (nor is it unittest safe, import safe, etc.). The if __name__ == '__main__':
protective wrapper should be part of all correct main modules. Go ahead and distribute it, with a note about requiring multiprocessing-safe main module protection.
In Python multiprocessing.Process , do we have to use `__name__ == __main__`?
As described in the multiprocessing guidelines under the heading "Safe importing of main module", some forms of multiprocessing need to import your main module and thus your program may run amok in a fork bomb if the __name__ == '__main__'
check is missing. In particular, this is the case on Windows where CPython cannot fork. So it is not safe to skip it. The test belongs at the top (global) level of your module, not inside some class. Its purpose is to stop the module from automatically running tasks (as opposed to defining classes, functions etc) when it is imported, as opposed to run directly.
RuntimeError on windows trying python multiprocessing
On Windows the subprocesses will import (i.e. execute) the main module at start. You need to insert an if __name__ == '__main__':
guard in the main module to avoid creating subprocesses recursively.
Modified testMain.py
:
import parallelTestModule
if __name__ == '__main__':
extractor = parallelTestModule.ParallelExtractor()
extractor.runInParallel(numProcesses=2, numThreads=4)
Check if calling script used if __name__ == __main__ (to comply with multiprocessing requirement)
You can use the traceback
module to inspect the stack and find the information you're looking for. Parse the top frame, and look for the main shield in the code.
I assume this will fail when you're working with a .pyc
file and don't have access to the source code, but I assume developers will test their code in the regular fashion first before doing any kind of packaging, so I think it's safe to assume your error message will get printed when needed.
Version with verbose messages:
import traceback
import re
def called_from_main_shield():
print("Calling introspect")
tb = traceback.extract_stack()
print(traceback.format_stack())
print(f"line={tb[0].line} lineno={tb[0].lineno} file={tb[0].filename}")
try:
with open(tb[0].filename, mode="rt") as f:
found_main_shield = False
for i, line in enumerate(f):
if re.search(r"__name__.*['\"]__main__['\"]", line):
found_main_shield = True
if i == tb[0].lineno:
print(f"found_main_shield={found_main_shield}")
return found_main_shield
except:
print("Coulnd't inspect stack, let's pretend the code is OK...")
return True
print(called_from_main_shield())
if __name__ == "__main__":
print(called_from_main_shield())
In the output, we see that the first called to called_from_main_shield
returns False
, while the second returns True
:
$ python3 introspect.py
Calling introspect
[' File "introspect.py", line 24, in <module>\n print(called_from_main_shield())\n', ' File "introspect.py", lin
e 7, in called_from_main_shield\n print(traceback.format_stack())\n']
line=print(called_from_main_shield()) lineno=24 file=introspect.py
found_main_shield=False
False
Calling introspect
[' File "introspect.py", line 27, in <module>\n print(called_from_main_shield())\n', ' File "introspect.py", lin
e 7, in called_from_main_shield\n print(traceback.format_stack())\n']
line=print(called_from_main_shield()) lineno=27 file=introspect.py
found_main_shield=True
True
More concise version:
def called_from_main_shield():
tb = traceback.extract_stack()
try:
with open(tb[0].filename, mode="rt") as f:
found_main_shield = False
for i, line in enumerate(f):
if re.search(r"__name__.*['\"]__main__['\"]", line):
found_main_shield = True
if i == tb[0].lineno:
return found_main_shield
except:
return True
Now, it's not super elegant to use re.search()
like I did, but it should be reliable enough. Warning: since I defined this function in my main script, I had to make sure that line didn't match itself, which is why I used ['\"]
to match the quotes instead of using a simpler RE like __name__.*__main__
. Whatever you chose, just make sure it's flexible enough to match all legal variants of that code, which is what I aimed for.
Why can't SQL Server run on a Novell server?
NOW I see your problem! Sorry dude!
Yes, VERY easy. Kinda.
SQL Server used to be able to talk IPX (the netware protocol) but I think Netware will now talk TCPIP, and you can run IPX and TCP/IP on the same network without an issue - windows clients can run both at the same time, 99% of routers handle all protocols etc.
Windows (XP/2003/etc) can run the netware client, so it can talk to shares etc.
Use the SQL Server logins (rather than windows integrated logins), and it'll work from anything - we have Java on Linux talking to SQL Server on windows just fine :) It's all in the connection string: userid=username;pwd=whatever;server=yourserverhere; etc. But you MUST use the SQL Server Configuration Manager to set these up - the default is shared memory, so you have to enable TCPIP etc.
Python2.7 Exception The freeze_support() line can be omitted if the program
This error message is displayed when using multiprocessing with the 'spawn'
start method (default on platforms lacking fork
like windows), and not protecting your code with a if __name__ = '__main__'
guard.
The reason is that with the 'spawn'
start method a new python process is spawned, which then in turn has to import the __main__
module before it can proceed to do it's work. If your program does not have the mentioned guard, that subprocess would try to execute the same code as the parent process again, spawning another process and so on, until your program (or computer) crashes.
The message is not ment to tell you to add the freeze_support()
line, but to guard your program:
import Queue
from multiprocessing.managers import BaseManager
def main():
BaseManager.register('get_queue', callable=lambda: Queue.Queue())
manager = BaseManager(address=('', 5000), authkey='abc')
manager.start()
manager.shutdown()
if __name__ == '__main__':
# freeze_support() here if program needs to be frozen
main() # execute this only when run directly, not when imported!
Related Topics
Syntaxerror Inconsistency in Python
How to Separate the Functions of a Class into Multiple Files
In Python, How to Convert Seconds Since Epoch to a 'Datetime' Object
Comparing Boolean and Int Using Isinstance
Create a Day-Of-Week Column in a Pandas Dataframe Using Python
Function Name Is Undefined in Python Class
Can Existing Virtualenv Be Upgraded Gracefully
Typeerror: Str Does Not Support Buffer Interface
Why Do "Not a Number" Values Equal True When Cast as Boolean in Python/Numpy
How to Keep Index When Using Pandas Merge
How to Get Flask to Run on Port 80
Removing Duplicates from Dictionary
How to Convert a File to Utf-8 in Python
How to Extract an Arbitrary Line of Values from a Numpy Array