What Is the Fastest Way to Stack Numpy Arrays in a Loop

What is the fastest way to stack numpy arrays in a loop?

What @hpaulj was trying to say with

Stick with list append when doing loops.

is

#use a normal list
result_arr = []

for label in labels_set:

data_transform = pca.fit_transform(data_sub_tfidf)

# append the data_transform object to that list
# Note: this is not np.append(), which is slow here
result_arr.append(data_transform)

# and stack it after the loop
# This prevents slow memory allocation in the loop.
# So only one large chunk of memory is allocated since
# the final size of the concatenated array is known.

result_arr = np.concatenate(result_arr)

# or
result_arr = np.stack(result_arr, axis=0)

# or
result_arr = np.vstack(result_arr)

Your arrays don't really have different dimensions. They have one different dimension, the other one is identical. And in that case you can always stack along the "different" dimension.

fastest way to concatenate large numpy arrays

Every time you append a new array, new memory is being allocated to create a bigger one and record data into it. This is very expensive. A better solution is to allocate a specific size of memory once and then record your date using np.concatenate only once:

np.concatenate([np.zeros(arraySize) for i in range(100)])

What is the most efficient way to deal with a loop on NumPy arrays?

You can at least remove the two forloop to save alot of time, use matrix computation directly

import time

import numpy as np

def loopingFunction(listOfVector1, listOfVector2):
resultArray = []

for vector1 in listOfVector1:
result = 0

for vector2 in listOfVector2:
result += np.dot(vector1, vector2) * vector2[2]

resultArray.append(result)

return np.array(resultArray)

def loopingFunction2(listOfVector1, listOfVector2):
resultArray = np.sum(np.dot(listOfVector1, listOfVector2.T) * listOfVector2[:,2], axis=1)

return resultArray

listOfVector1x = np.linspace(0,0.33,1000)
listOfVector1y = np.linspace(0.33,0.66,1000)
listOfVector1z = np.linspace(0.66,1,1000)

listOfVector1 = np.column_stack((listOfVector1x, listOfVector1y, listOfVector1z))

listOfVector2x = np.linspace(0.33,0.66,1000)
listOfVector2y = np.linspace(0.66,1,1000)
listOfVector2z = np.linspace(0, 0.33, 1000)

listOfVector2 = np.column_stack((listOfVector2x, listOfVector2y, listOfVector2z))
import time
t0 = time.time()
result = loopingFunction(listOfVector1, listOfVector2)
print('time old version',time.time() - t0)
t0 = time.time()
result2 = loopingFunction2(listOfVector1, listOfVector2)
print('time matrix computation version',time.time() - t0)
print('Are results are the same',np.allclose(result,result2))

Which gives

time old version 1.174513578414917
time matrix computation version 0.011968612670898438
Are results are the same True

Basically, the less loop the better.

Concatenate numpy array within a for loop

In order to create the concatenation and work around the error, I initialized the array with None and tested if it is None in the loop.
Thereby you do not have to worry about not fitting dimensions.
However, i created some arrays for the ones you did only describe and ended up with a final dimesion of (400, 30, 30, 3).
This fits in here, since 20*20 = 400.
Hope this helps for you solution.

new_one_arr1_list = []
new_one_arr2_list = []
one_arr1 = np.ones((20,30,30,3))
one_arr2 = np.ones((20,30,30,3))
all_arr1 = None
count = 0
for item in one_arr1: # 100 iterations
item = np.reshape(item, (1, 30, 30, 3))
new_one_arr1 = np.repeat(item, 20, axis=0)

# print(all_arr1.shape, new_one_arr1.shape)
if all_arr1 is None:
all_arr1 = new_one_arr1
else:
all_arr1 = np.concatenate(([all_arr1 , new_one_arr1 ]), axis=0)
ind = np.random.randint(one_arr2.shape[0], size=(20,))
new_one_arr2= one_arr1[ind]

new_one_arr1_list.append(new_one_arr1)
new_one_arr2_list.append(new_one_arr2)
count += 1
print(count)
all_arr1.shape

vertical stack numpy array in for loop

You don't need a for loop. You can use np.vstack instead:

import numpy as np

lst = [np.array([[1, 2], [3, 4]]), np.array([[5, 6]]), np.array([[7, 8], [9, 10]])]
a = np.vstack(lst)
print(a)

# [[ 1 2]
# [ 3 4]
# [ 5 6]
# [ 7 8]
# [ 9 10]]

If your goal is to construct a dataframe, then you can use itertools.chain with pd.DataFrame.from_records (without even making the v-stacked array):

import numpy as np
import pandas as pd
import itertools

lst = [np.array([[1, 2], [3, 4]]), np.array([[5, 6]]), np.array([[7, 8], [9, 10]])]
df = pd.DataFrame.from_records(itertools.chain.from_iterable(lst))
print(df)

# 0 1
# 0 1 2
# 1 3 4
# 2 5 6
# 3 7 8
# 4 9 10

P.S. Please don't post a screenshot. Make a copy & paste-able minimal example which people can easily work on.

Combining numpy arrays inside a loop

Inside the loop, just append precip_subset to your list:

precip_subsetland2010.append(precip_subset)

Outside the loop, call np.vstack, to vertically stack your data.

output = np.vstack(precip_subsetland2010)

Printing output.shape should give you something like (X, 180, 140) (X being the sum of all rows of the constituent arrays).



Related Topics



Leave a reply



Submit