Numpy sum elements in array based on its value
We can use np.bincount
which is supposedly pretty efficient for such accumulative weighted counting, so here's one with that -
counts = np.bincount(i,v)
d[:counts.size] = counts
Alternatively, using minlength
input argument and for a generic case when d
could be any array and we want to add into it -
d += np.bincount(i,v,minlength=d.size).astype(d.dtype, copy=False)
Runtime tests
This section compares np.add.at
based approach listed in the other post
with the np.bincount
based one listed earlier in this post.
In [61]: def bincount_based(d,i,v):
...: counts = np.bincount(i,v)
...: d[:counts.size] = counts
...:
...: def add_at_based(d,i,v):
...: np.add.at(d, i, v)
...:
In [62]: # Inputs (random numbers)
...: N = 10000
...: i = np.random.randint(0,1000,(N))
...: v = np.random.randint(0,1000,(N))
...:
...: # Setup output arrays for two approaches
...: M = 12000
...: d1 = np.zeros(M)
...: d2 = np.zeros(M)
...:
In [63]: bincount_based(d1,i,v) # Run approaches
...: add_at_based(d2,i,v)
...:
In [64]: np.allclose(d1,d2) # Verify outputs
Out[64]: True
In [67]: # Setup output arrays for two approaches again for timing
...: M = 12000
...: d1 = np.zeros(M)
...: d2 = np.zeros(M)
...:
In [68]: %timeit add_at_based(d2,i,v)
1000 loops, best of 3: 1.83 ms per loop
In [69]: %timeit bincount_based(d1,i,v)
10000 loops, best of 3: 52.7 µs per loop
How to get sum of values in a numpy array based on another array with repetitive indices
Please checkout stackoverflow question guidelines in order to ask better questions, as well as properly format them.
Options
Original code
This is what you want to optimize for large values of N
(I took the liberty of editing your code so that it does not have hardcoded values and fixed a typo, data_values
instead of data
):
data_values = np.random.rand(N)
data_ind = np.random.randint(0, N, N)
xsize = data_ind.max() + 1
nodal_values = np.zeros(xsize, dtype=np.float32)
for nodes in range(xsize):
nodal_values[nodes] = np.sum(data_values[np.where(data_ind == nodes)[0]])
Slightly better version (for readability)
I created the following version which improves readability and takes away the use of np.where
:
idx = np.arange(xsize)[:, None] == data_ind
nodal_values = [np.sum(data_values[idx[i]]) for i in range(xsize)] # Python list
Much better version
I implemented the accepted answer in here (be sure to check it out to understand it better) by @Divakar to your case:
_, idx, _ = np.unique(data_ind, return_counts=True, return_inverse=True)
nodal_values = np.bincount(idx, data_values) # Same shape and type as your version
Comparison
Using your original values:
data_values = np.array([0.81444589, 0.57734696, 0.54130794, 0.22339518, 0.916973, 0.14956333, 0.74504583, 0.36218693, 0.17958372, 0.47195214])
data_ind = np.array([7, 5, 2, 2, 0, 6, 6, 1, 4, 3])
I got the following performance using timeit
module (mean ± std. dev. of 7 runs, 10000000 loops each
):
Original code: 49.2 +- 11.1 ns
Much better version: 45.2 +- 4.98 ns
Slightly better version: 36.4 +- 2.81 ns
For really small values of N
, i.e, 1 to 10, there is no significant difference. However, for big ones, there is no question as to which one to use; both versions with for-loops take too long, while the vectorized implementation does it extremely fast.
Code to test it out
import numpy as np
import timeit
import matplotlib.pyplot as plt
def original_code():
xsize = data_ind.max() + 1
nodal_values = np.zeros(xsize, dtype=np.float32)
for nodes in range(xsize):
nodal_values[nodes] = np.sum(data_values[np.where(data_ind == nodes)[0]])
def much_better():
_, idx, _ = np.unique(data_ind, return_counts=True, return_inverse=True)
nodal_values = np.bincount(idx, data_values)
def slightly_better():
xsize = data_ind.max() + 1
idx = np.arange(xsize)[:, None] == data_ind
nodal_values = [np.sum(data_values[idx[i]]) for i in range(xsize)]
sizes = [i*5 for i in range(1, 7)]
original_code_times = np.zeros((len(sizes),))
slightly_better_times = np.zeros((len(sizes),))
much_better_times = np.zeros((len(sizes),))
for i, N in enumerate(sizes):
print(N)
data_values = np.random.rand(N)
data_ind = np.random.randint(0, N, N)
# Divided by 100 repeats to get average
original_code_times[i] = timeit.timeit(original_code, number=100) / 100
much_better_times[i] = timeit.timeit(much_better, number=100) / 100
slightly_better_times[i] = timeit.timeit(slightly_better, number=100) / 100
# Multiply by 1000 to get everything in ms
original_code_times *= 1000
slightly_better_times *= 1000
much_better_times *= 1000
# %%
plt.figure(dpi=120)
plt.title("Small N's")
plt.plot(sizes, original_code_times, label="Original code")
plt.plot(sizes, slightly_better_times, label="Slightly better")
plt.plot(sizes, much_better_times, label="Much better")
plt.ylabel("Time [ms]")
plt.xlabel("N")
plt.xticks(sizes)
plt.legend()
plt.savefig("small_N.png", dpi=120)
plt.show()
plt.close()
I hope this helps anyone who may stumble upon this.
Numpy Sum when each array has a specific value
You just have to use a single &
and add some parentheses:
np.sum((x1 == 1) & (y == -1))
This gives 3 as a result.
How to sum across n elements of numpy array
np.empty
creates an array containing uninitialized data. In your code, you initialize an array output
of length 24 but assign only 22 values to it. The last 2 values contain arbitrary values (i.e. garbage). Unless performance is of importance, np.zeros
is usually the better choice for initializing arrays since all values will have a consistent value of 0.
You can solve this without a for loop by padding the input array with zeros, then computing a vectorized sum.
import numpy as np
load = np.array([10, 12, 9, 13, 17, 23, 25, 28, 26, 24, 22, 20, 18, 20, 22, 24, 26, 28, 23, 24, 21, 18, 16, 13])
tmp = np.pad(load, [0, 2])
output = load + tmp[1:-1] + tmp[2:]
print(output)
Output
[31 34 39 53 65 76 79 78 72 66 60 58 60 66 72 78 77 75 68 63 55 47 29 13]
Sum values from numpy array if condition on value in another array is met
You can get a lot more performance by writing your first version completely in numpy. Replace pythons sum
with np.sum
. Instead of the for i in positions
list comprehension, simply pass the positions
mask you are creating anyways.
Indeed, the np.where
is not necessary and my best version looks like:
#my version 0
t0=time.time()
result = np.empty(len(z_distances))
for index_dist, val_dist in enumerate(z_distances):
positions = pos_part[:, 2] < val_dist
result[index_dist] = np.sum(prop_part[positions])
print("v0 :",time.time()-t0)
# out: v0 : 0.06322097778320312
You can get a bit faster if z_distances is very long by using numba
.
Running calc
for the first time usually creates some overhead which we can get rid of by running the function for some small set of `z_distances.
The below code achieves roughly a factor of two speedup over pure numpy on my laptop.
import numba as nb
@nb.njit(parallel=True)
def calc(result, z_distances):
n = z_distances.shape[0]
for ii in nb.prange(n):
pos = pos_part[:, 2] < z_distances[ii]
result[ii] = np.sum(prop_part[pos])
return result
result4 = np.zeros_like(result)
# _t = time.time()
# calc(result4, z_distances[:10])
# print(time.time()-_t)
t3 = time.time()
result4 = calc(result4, z_distances)
print("v3 :", time.time()-t3)
plt.plot(z_distances, result4)
Summing data from array based on other array in Numpy
Here's a vectorized approach to get the counts for ID
and ID-based
summed values for value
with a combination of np.unique
and np.bincount
-
unqID,idx,IDsums = np.unique(ID,return_counts=True,return_inverse=True)
value_sums = np.bincount(idx,value.ravel())
To get the final output as a dictionary, you can use loop-comprehension to gather the summed values, like so -
{i:(IDsums[itr],value_sums[itr]) for itr,i in enumerate(unqID)}
Sample run -
In [86]: ID
Out[86]:
array([[1, 1, 1, 2, 2],
[1, 1, 2, 2, 5],
[1, 1, 2, 5, 5],
[1, 2, 2, 5, 5],
[2, 2, 5, 5, 5]])
In [87]: value
Out[87]:
array([[ 14.8, 17. , 74.3, 40.3, 90.2],
[ 25.2, 75.9, 5.6, 40. , 33.7],
[ 78.9, 39.3, 11.3, 63.6, 56.7],
[ 11.4, 75.7, 78.4, 88.7, 58.6],
[ 79.6, 32.3, 35.3, 52.5, 13.3]])
In [88]: unqID,idx,IDsums = np.unique(ID,return_counts=True,return_inverse=True)
...: value_sums = np.bincount(idx,value.ravel())
...:
In [89]: {i:(IDsums[itr],value_sums[itr]) for itr,i in enumerate(unqID)}
Out[89]:
{1: (8, 336.80000000000001),
2: (9, 453.40000000000003),
5: (8, 402.40000000000003)}
Summing values of numpy array based on indices in other array
import numpy as np
N = 8
M = 4
b = np.array([0, 1, 2, 0, 2, 3, 1, 1])
c = np.array([ 0.15517108, 0.84717734, 0.86019899, 0.62413489, 0.24357903, 0.86015187, 0.85813481, 0.7071174 ])
a = ((np.mgrid[:M,:N] == b)[0] * c).sum(axis=1)
returns
array([ 0.77930597, 2.41242955, 1.10377802, 0.86015187])
Numpy sum elements in a multi-dimensional array according to indices
Ignoring the -1
complication for the moment, the straight forward indexing and summation is:
In [58]: arr = np.array([[ 1, 2, 3, 4, 5], [5, 6, 7, 8, 9]])
In [59]: idx = np.array([[[0, 1, 1], [2, 4, 6]],
...: [[5, 1, 3], [1, -1, -1]]])
In [60]: arr.flat[idx]
Out[60]:
array([[[1, 2, 2],
[3, 5, 6]],
[[5, 2, 4],
[2, 9, 9]]])
In [61]: _.sum(axis=-1)
Out[61]:
array([[ 5, 14],
[11, 20]])
One way (not necessarily fast or memory efficient) of dealing with the -1 is with a masked array:
In [62]: mask = idx<0
In [63]: mask
Out[63]:
array([[[False, False, False],
[False, False, False]],
[[False, False, False],
[False, True, True]]])
In [65]: ma = np.ma.masked_array(Out[60],mask)
In [67]: ma
Out[67]:
masked_array(
data=[[[1, 2, 2],
[3, 5, 6]],
[[5, 2, 4],
[2, --, --]]],
mask=[[[False, False, False],
[False, False, False]],
[[False, False, False],
[False, True, True]]],
fill_value=999999)
In [68]: ma.sum(axis=-1)
Out[68]:
masked_array(
data=[[5, 14],
[11, 2]],
mask=[[False, False],
[False, False]],
fill_value=999999)
Masked arrays deal with operations like the sum by replacing the masked values with something neutral, such as 0 for the case of sums.
(I may revisit this in the morning).
sum with matrix product
In [72]: np.einsum('ijk,ijk->ij',Out[60],~mask)
Out[72]:
array([[ 5, 14],
[11, 2]])
This is more direct, and faster, than the masked array approach.
You haven't elaborated on constructing the index_tensor
so I won't try to compare it.
Another possibility is to pad the array with a 0, and adjust indexing:
In [83]: arr1 = np.hstack((0,arr.ravel()))
In [84]: arr1
Out[84]: array([0, 1, 2, 3, 4, 5, 5, 6, 7, 8, 9])
In [85]: arr1[idx+1]
Out[85]:
array([[[1, 2, 2],
[3, 5, 6]],
[[5, 2, 4],
[2, 0, 0]]])
In [86]: arr1[idx+1].sum(axis=-1)
Out[86]:
array([[ 5, 14],
[11, 2]])
sparse
A first stab at using sparse matrix:
Reshape idx
to 2d:
In [141]: idx1 = np.reshape(idx,(4,3))
make a sparse tensor from that. For a start I'll go the iterative lil
approach, though usually constructing coo
(or even csr
) inputs directly is faster:
In [142]: M = sparse.lil_matrix((4,10),dtype=int)
...: for i in range(4):
...: for j in range(3):
...: v = idx1[i,j]
...: if v>=0:
...: M[i,v] = 1
...:
In [143]: M
Out[143]:
<4x10 sparse matrix of type '<class 'numpy.int64'>'
with 9 stored elements in List of Lists format>
In [144]: M.A
Out[144]:
array([[1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 1, 0, 1, 0, 0, 0],
[0, 1, 0, 1, 0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0]])
This can then be used for a sum of products:
In [145]: M@arr.ravel()
Out[145]: array([ 3, 14, 11, 2])
Using M.A@arr.ravel()
is essentially what you do. While M
is sparse, arr
is not. For this case M.A@
is faster than M@
.
Related Topics
Naturally Sorting Pandas Dataframe
Log into Gmail Using Selenium in Python
Django Db Settings 'Improperly Configured' Error
Underscore _ as Variable Name in Python
How to Draw Intersecting Planes
Add Pygame Module in Pycharm Ide
Convert to Binary and Keep Leading Zeros
C and Python - Different Behaviour of the Modulo (%) Operation
Can a Lambda Function Call Itself Recursively in Python
What Is the G Object in This Flask Code
Pyinstaller and --Onefile: How to Include an Image in the Exe File
How to Plot Multiple Functions on the Same Figure, in Matplotlib
Escape String Python for MySQL
Databaseerror: Current Transaction Is Aborted, Commands Ignored Until End of Transaction Block