How do I parallelize a simple Python loop?
Using multiple threads on CPython won't give you better performance for pure-Python code due to the global interpreter lock (GIL). I suggest using the multiprocessing
module instead:
pool = multiprocessing.Pool(4)
out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))
Note that this won't work in the interactive interpreter.
To avoid the usual FUD around the GIL: There wouldn't be any advantage to using threads for this example anyway. You want to use processes here, not threads, because they avoid a whole bunch of problems.
parallelize 'for' loop in Python 3
My guess is that you want to work on several files at the same time. To do so, the best way (in my opinion) is to use multiprocessing
. To use this, you need to define an elementary step, and it is already done in your code.
import numpy as np
import multiprocessing as mp
import os
def f(file):
mindex=np.zeros((1200,1200))
for i in range(1200):
var1 = xray.open_dataset(file)['variable'][:,i,:].data
for j in range(1200):
var2 = var1[:,j]
## Mathematical Calculations to find var3[i,j]##
mindex[i,j] = var3[i,j]
return (file, mindex)
if __name__ == '__main__':
N= mp.cpu_count()
files = os.scandir(folder)
with mp.Pool(processes = N) as p:
results = p.map(f, [file.name for file in files])
This should return a list of element results
in which each element is a tuple with the file name and the mindex matrix. With this, you can work on multiple files at the same time. It is particularly efficient if the computation on each file is long.
Implement Parallel for loops in Python
You can also use concurrent.futures
in Python 3, which is a simpler interface than multiprocessing
. See this for more details about differences.
from concurrent import futures
total_error = 0
with futures.ProcessPoolExecutor() as pool:
for error in pool.map(some_function_call, parameters1, parameters2):
total_error += error
In this case, parameters1
and parameters2
should be a list or iterable of the same size as the number of times you want to run the function (24 times as per your example).
If paramters<1,2>
are not iterables/mappable, but you just want to run the function 24 times, you can submit the jobs for the function for the required number of times, and later acquire the result using a callback.
class TotalError:
def __init__(self):
self.value = 0
def __call__(self, r):
self.value += r.result()
total_error = TotalError()
with futures.ProcessPoolExecutor() as pool:
for i in range(24):
future_result = pool.submit(some_function_call, parameters1, parameters2)
future_result.add_done_callback(total_error)
print(total_error.value)
parallelize for loop and merge pandas dataframes
Edit: using multiprocessing
instead of threading
After reading your comments it seems that you want to run your function in different processes (in parallel):
import multiprocessing
import pandas as pd
df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3']})
year_start = 2020
year_stop = 2015
year_range = range(year_start, year_stop, -1)
def make_df(year):
df = pd.DataFrame({str(year): [str(year), str(year+1), str(year+2), str(year+3)]})
return df
pool = multiprocessing.Pool(year_start - year_stop)
df_list = pool.map(func=make_df, iterable=year_range)
pool.close()
pool.join()
df = df.join(df_list)
print(df)
Related Topics
Django Template How to Look Up a Dictionary Value With a Variable
Do Regular Expressions from the Re Module Support Word Boundaries (\B)
Is There a Simple, Elegant Way to Define Singletons
How to Copy a String to the Clipboard
Why Can't Python'S Raw String Literals End With a Single Backslash
Python Subprocess Readlines() Hangs
How to Install Pip on Macos or Os X
How to Get the Last Element of a List
How to Identify on Which Os Python Is Running On
How to Check Which Version of Python Is Running My Script
Python Requests Throwing Sslerror
Performant Cartesian Product (Cross Join) With Pandas
Installing Specific Package Version With Pip
Most Efficient Way to Map Function Over Numpy Array
Setting the Correct Encoding When Piping Stdout in Python