is python capable of running on multiple cores?
The answer is "Yes, But..."
But cPython cannot when you are using regular threads for concurrency.
You can either use something like multiprocessing
, celery
or mpi4py
to split the parallel work into another process;
Or you can use something like Jython or IronPython to use an alternative interpreter that doesn't have a GIL.
A softer solution is to use libraries that don't run afoul of the GIL for heavy CPU tasks, for instance numpy
can do the heavy lifting while not retaining the GIL, so other python threads can proceed. You can also use the ctypes
library in this way.
If you are not doing CPU bound work, you can ignore the GIL issue entirely (kind of) since python won't acquire the GIL while it's waiting for IO.
Python: Multicore processing?
Yes, it's possible to do this summation over several processes, very much like doing it with multiple threads:
from multiprocessing import Process, Queue
def do_sum(q,l):
q.put(sum(l))
def main():
my_list = range(1000000)
q = Queue()
p1 = Process(target=do_sum, args=(q,my_list[:500000]))
p2 = Process(target=do_sum, args=(q,my_list[500000:]))
p1.start()
p2.start()
r1 = q.get()
r2 = q.get()
print r1+r2
if __name__=='__main__':
main()
However, it is likely that doing it with multiple processes is likely slower than doing it in a single process, as copying the data forth and back is more expensive than summing them right away.
Python - How to make use of multiple CPU cores
In general, you're right: you'll use one CPU core with one python
process. However, there are many ways which allow you to use more than one CPU core. Have a look at the official Python docs about multiprocessing
.
This is an example, which will stress your CPU on all its cores:
from multiprocessing import Pool, cpu_count
def random_calculation(x):
while True:
x * x
p = Pool(processes=cpu_count())
p.map(random_calculation, range(cpu_count()))
Running Python on multiple cores
To answer your second question first, "Finished" is printed to the terminal because a = input("Finished")
is outside of your if __name__ == '__main__':
code block. It is thus a module level constant which gets assigned when the module is first loaded and will execute before any code in the module runs.
To answer the first question, you only created one process which you run and then wait to complete before continuing. This gives you zero benefits of multiprocessing and incurs overhead of creating the new process.
Because you want to create several processes, you need to create a pool via a collection of some sort (e.g. a python list) and then start all of the processes.
In practice, you need to be concerned with more than the number of processors (such as the amount of available memory, the ability to restart workers that crash, etc.). However, here is a simple example that completes your task above.
import datetime as dt
from multiprocessing import Process, current_process
import sys
def f(name):
print('{}: hello {} from {}'.format(
dt.datetime.now(), name, current_process().name))
sys.stdout.flush()
if __name__ == '__main__':
worker_count = 8
worker_pool = []
for _ in range(worker_count):
p = Process(target=f, args=('bob',))
p.start()
worker_pool.append(p)
for p in worker_pool:
p.join() # Wait for all of the workers to finish.
# Allow time to view results before program terminates.
a = input("Finished") # raw_input(...) in Python 2.
Also note that if you join workers immediately after starting them, you are waiting for each worker to complete its task before starting the next worker. This is generally undesirable unless the ordering of the tasks must be sequential.
Typically Wrong
worker_1.start()
worker_1.join()
worker_2.start() # Must wait for worker_1 to complete before starting worker_2.
worker_2.join()
Usually Desired
worker_1.start()
worker_2.start() # Start all workers.
worker_1.join()
worker_2.join() # Wait for all workers to finish.
For more information, please refer to the following links:
- https://docs.python.org/3/library/multiprocessing.html
- Dead simple example of using Multiprocessing Queue, Pool and Locking
- https://pymotw.com/2/multiprocessing/basics.html
- https://pymotw.com/2/multiprocessing/communication.html
- https://pymotw.com/2/multiprocessing/mapreduce.html
Related Topics
How to Turn Off Info Logging in Spark
Error Installing Geopandas:" a Gdal API Version Must Be Specified " in Anaconda
Django Reverse Lookup of Foreign Keys
Easy Way of Finding Decimal Places
Creating Spark Data Structure from Multiline Record
Differencebetween Using Loc and Using Just Square Brackets to Filter for Columns in Pandas/Python
Import Multiple Excel Files into Python Pandas and Concatenate Them into One Dataframe
How to Fit a Sine Curve to My Data with Pylab and Numpy
How to Suppress the Newline After a Print Statement
Extrapolate Values in Pandas Dataframe
How to Add a Custom Ca Root Certificate to the Ca Store Used by Pip in Windows
Asyncio.Sleep() VS Time.Sleep()
Python Numpy Arange Unexpected Results
How to Detect the Python Version at Runtime