What is the fastest way to stack numpy arrays in a loop?
What @hpaulj was trying to say with
Stick with list append when doing loops.
is
#use a normal list
result_arr = []
for label in labels_set:
data_transform = pca.fit_transform(data_sub_tfidf)
# append the data_transform object to that list
# Note: this is not np.append(), which is slow here
result_arr.append(data_transform)
# and stack it after the loop
# This prevents slow memory allocation in the loop.
# So only one large chunk of memory is allocated since
# the final size of the concatenated array is known.
result_arr = np.concatenate(result_arr)
# or
result_arr = np.stack(result_arr, axis=0)
# or
result_arr = np.vstack(result_arr)
Your arrays don't really have different dimensions. They have one different dimension, the other one is identical. And in that case you can always stack along the "different" dimension.
fastest way to concatenate large numpy arrays
Every time you append a new array, new memory is being allocated to create a bigger one and record data into it. This is very expensive. A better solution is to allocate a specific size of memory once and then record your date using np.concatenate
only once:
np.concatenate([np.zeros(arraySize) for i in range(100)])
What is the most efficient way to deal with a loop on NumPy arrays?
You can at least remove the two forloop to save alot of time, use matrix computation directly
import time
import numpy as np
def loopingFunction(listOfVector1, listOfVector2):
resultArray = []
for vector1 in listOfVector1:
result = 0
for vector2 in listOfVector2:
result += np.dot(vector1, vector2) * vector2[2]
resultArray.append(result)
return np.array(resultArray)
def loopingFunction2(listOfVector1, listOfVector2):
resultArray = np.sum(np.dot(listOfVector1, listOfVector2.T) * listOfVector2[:,2], axis=1)
return resultArray
listOfVector1x = np.linspace(0,0.33,1000)
listOfVector1y = np.linspace(0.33,0.66,1000)
listOfVector1z = np.linspace(0.66,1,1000)
listOfVector1 = np.column_stack((listOfVector1x, listOfVector1y, listOfVector1z))
listOfVector2x = np.linspace(0.33,0.66,1000)
listOfVector2y = np.linspace(0.66,1,1000)
listOfVector2z = np.linspace(0, 0.33, 1000)
listOfVector2 = np.column_stack((listOfVector2x, listOfVector2y, listOfVector2z))
import time
t0 = time.time()
result = loopingFunction(listOfVector1, listOfVector2)
print('time old version',time.time() - t0)
t0 = time.time()
result2 = loopingFunction2(listOfVector1, listOfVector2)
print('time matrix computation version',time.time() - t0)
print('Are results are the same',np.allclose(result,result2))
Which gives
time old version 1.174513578414917
time matrix computation version 0.011968612670898438
Are results are the same True
Basically, the less loop the better.
Concatenate numpy array within a for loop
In order to create the concatenation and work around the error, I initialized the array with None and tested if it is None in the loop.
Thereby you do not have to worry about not fitting dimensions.
However, i created some arrays for the ones you did only describe and ended up with a final dimesion of (400, 30, 30, 3)
.
This fits in here, since 20*20 = 400
.
Hope this helps for you solution.
new_one_arr1_list = []
new_one_arr2_list = []
one_arr1 = np.ones((20,30,30,3))
one_arr2 = np.ones((20,30,30,3))
all_arr1 = None
count = 0
for item in one_arr1: # 100 iterations
item = np.reshape(item, (1, 30, 30, 3))
new_one_arr1 = np.repeat(item, 20, axis=0)
# print(all_arr1.shape, new_one_arr1.shape)
if all_arr1 is None:
all_arr1 = new_one_arr1
else:
all_arr1 = np.concatenate(([all_arr1 , new_one_arr1 ]), axis=0)
ind = np.random.randint(one_arr2.shape[0], size=(20,))
new_one_arr2= one_arr1[ind]
new_one_arr1_list.append(new_one_arr1)
new_one_arr2_list.append(new_one_arr2)
count += 1
print(count)
all_arr1.shape
vertical stack numpy array in for loop
You don't need a for loop. You can use np.vstack
instead:
import numpy as np
lst = [np.array([[1, 2], [3, 4]]), np.array([[5, 6]]), np.array([[7, 8], [9, 10]])]
a = np.vstack(lst)
print(a)
# [[ 1 2]
# [ 3 4]
# [ 5 6]
# [ 7 8]
# [ 9 10]]
If your goal is to construct a dataframe, then you can use itertools.chain
with pd.DataFrame.from_records
(without even making the v-stacked array):
import numpy as np
import pandas as pd
import itertools
lst = [np.array([[1, 2], [3, 4]]), np.array([[5, 6]]), np.array([[7, 8], [9, 10]])]
df = pd.DataFrame.from_records(itertools.chain.from_iterable(lst))
print(df)
# 0 1
# 0 1 2
# 1 3 4
# 2 5 6
# 3 7 8
# 4 9 10
P.S. Please don't post a screenshot. Make a copy & paste-able minimal example which people can easily work on.
Combining numpy arrays inside a loop
Inside the loop, just append precip_subset
to your list:
precip_subsetland2010.append(precip_subset)
Outside the loop, call np.vstack
, to vertically stack your data.
output = np.vstack(precip_subsetland2010)
Printing output.shape
should give you something like (X, 180, 140)
(X
being the sum of all rows of the constituent arrays).
Related Topics
How to Clear Only Last One Line in Python Output Console
How to Map the Differences Between Two Strings
How to Map True/False to 1/0 in a Pandas Dataframe
Use Variable as Key Name in Python Dictionary
How to Read a Specific Line from a Text File in Python
How to Find 3 Immediate Words After Keyword Match Using Python
How to Clear All Variables in the Middle of a Python Script
Python Xlsxwriter Set Border Around Multiple Cells
Pyspark Add New Row to Dataframe
Python - Split Array into Multiple Arrays
How to Perform Union on Two Dataframes With Different Amounts of Columns in Spark
Converting Pandas Column of Comma-Separated Strings into Integers
Print All Number Divisible by 7 and Contain 7 from 0 to 100
How to Remove Hashtag, @User, Link of a Tweet Using Regular Expression
How to Concatenate/Append Multiple Spark Dataframes Column Wise in Pyspark
How to Delete a Specific Line in a File
Get the First Item from an Iterable That Matches a Condition