Numpy Sum Elements in Array Based on Its Value

Numpy sum elements in array based on its value

We can use np.bincount which is supposedly pretty efficient for such accumulative weighted counting, so here's one with that -

counts = np.bincount(i,v)
d[:counts.size] = counts

Alternatively, using minlength input argument and for a generic case when d could be any array and we want to add into it -

d += np.bincount(i,v,minlength=d.size).astype(d.dtype, copy=False)

Runtime tests

This section compares np.add.at based approach listed in the other post with the np.bincount based one listed earlier in this post.

In [61]: def bincount_based(d,i,v):
    ...:     counts = np.bincount(i,v)
    ...:     d[:counts.size] = counts
    ...: 
    ...: def add_at_based(d,i,v):
    ...:     np.add.at(d, i, v)
    ...:     

In [62]: # Inputs (random numbers)
    ...: N = 10000
    ...: i = np.random.randint(0,1000,(N))
    ...: v = np.random.randint(0,1000,(N))
    ...: 
    ...: # Setup output arrays for two approaches
    ...: M = 12000
    ...: d1 = np.zeros(M)
    ...: d2 = np.zeros(M)
    ...: 

In [63]: bincount_based(d1,i,v) # Run approaches
    ...: add_at_based(d2,i,v)
    ...: 

In [64]: np.allclose(d1,d2)  # Verify outputs
Out[64]: True

In [67]: # Setup output arrays for two approaches again for timing
    ...: M = 12000
    ...: d1 = np.zeros(M)
    ...: d2 = np.zeros(M)
    ...: 

In [68]: %timeit add_at_based(d2,i,v)
1000 loops, best of 3: 1.83 ms per loop

In [69]: %timeit bincount_based(d1,i,v)
10000 loops, best of 3: 52.7 µs per loop

How to get sum of values in a numpy array based on another array with repetitive indices

Please checkout stackoverflow question guidelines in order to ask better questions, as well as properly format them.

Options

Original code

This is what you want to optimize for large values of N (I took the liberty of editing your code so that it does not have hardcoded values and fixed a typo, data_values instead of data):

data_values = np.random.rand(N) 
data_ind = np.random.randint(0, N, N)

xsize = data_ind.max() + 1
nodal_values = np.zeros(xsize, dtype=np.float32)  
for nodes in range(xsize):  
    nodal_values[nodes] = np.sum(data_values[np.where(data_ind == nodes)[0]])

Slightly better version (for readability)

I created the following version which improves readability and takes away the use of np.where:

idx = np.arange(xsize)[:, None] == data_ind
nodal_values = [np.sum(data_values[idx[i]]) for i in range(xsize)] # Python list

Much better version

I implemented the accepted answer in here (be sure to check it out to understand it better) by @Divakar to your case:

_, idx, _ = np.unique(data_ind, return_counts=True, return_inverse=True)
nodal_values = np.bincount(idx, data_values) # Same shape and type as your version

Comparison

Using your original values:

data_values = np.array([0.81444589, 0.57734696, 0.54130794, 0.22339518, 0.916973, 0.14956333, 0.74504583, 0.36218693, 0.17958372, 0.47195214])
data_ind = np.array([7, 5, 2, 2, 0, 6, 6, 1, 4, 3])

I got the following performance using timeit module (mean ± std. dev. of 7 runs, 10000000 loops each):

Original code: 49.2 +- 11.1 ns
Much better version: 45.2 +- 4.98 ns
Slightly better version: 36.4 +- 2.81 ns

For really small values of N, i.e, 1 to 10, there is no significant difference. However, for big ones, there is no question as to which one to use; both versions with for-loops take too long, while the vectorized implementation does it extremely fast.

Small N comparison
Big N comparison

Code to test it out

import numpy as np
import timeit
import matplotlib.pyplot as plt

def original_code():
    xsize = data_ind.max() + 1
    nodal_values = np.zeros(xsize, dtype=np.float32)
    for nodes in range(xsize):
        nodal_values[nodes] = np.sum(data_values[np.where(data_ind == nodes)[0]])

def much_better():
    _, idx, _ = np.unique(data_ind, return_counts=True, return_inverse=True)
    nodal_values = np.bincount(idx, data_values)

def slightly_better():
    xsize = data_ind.max() + 1
    idx = np.arange(xsize)[:, None] == data_ind
    nodal_values = [np.sum(data_values[idx[i]]) for i in range(xsize)]

sizes = [i*5 for i in range(1, 7)]
original_code_times = np.zeros((len(sizes),))
slightly_better_times = np.zeros((len(sizes),))
much_better_times = np.zeros((len(sizes),))
for i, N in enumerate(sizes):
    print(N)
    data_values = np.random.rand(N)
    data_ind = np.random.randint(0, N, N)

    # Divided by 100 repeats to get average
    original_code_times[i] = timeit.timeit(original_code, number=100) / 100
    much_better_times[i] = timeit.timeit(much_better, number=100) / 100
    slightly_better_times[i] = timeit.timeit(slightly_better, number=100) / 100

# Multiply by 1000 to get everything in ms
original_code_times *= 1000
slightly_better_times *= 1000
much_better_times *= 1000

# %%
plt.figure(dpi=120)
plt.title("Small N's")
plt.plot(sizes, original_code_times, label="Original code")
plt.plot(sizes, slightly_better_times, label="Slightly better")
plt.plot(sizes, much_better_times, label="Much better")
plt.ylabel("Time [ms]")
plt.xlabel("N")
plt.xticks(sizes)
plt.legend()
plt.savefig("small_N.png", dpi=120)
plt.show()
plt.close()

I hope this helps anyone who may stumble upon this.

Numpy Sum when each array has a specific value

You just have to use a single & and add some parentheses:

np.sum((x1 == 1) & (y == -1))

This gives 3 as a result.

How to sum across n elements of numpy array

np.empty creates an array containing uninitialized data. In your code, you initialize an array output of length 24 but assign only 22 values to it. The last 2 values contain arbitrary values (i.e. garbage). Unless performance is of importance, np.zeros is usually the better choice for initializing arrays since all values will have a consistent value of 0.

You can solve this without a for loop by padding the input array with zeros, then computing a vectorized sum.

import numpy as np

load = np.array([10, 12, 9, 13, 17, 23, 25, 28, 26, 24, 22, 20, 18, 20, 22, 24, 26, 28, 23, 24, 21, 18, 16, 13])
tmp = np.pad(load, [0, 2])
output = load + tmp[1:-1] + tmp[2:]

print(output)

Output

[31 34 39 53 65 76 79 78 72 66 60 58 60 66 72 78 77 75 68 63 55 47 29 13]

Sum values from numpy array if condition on value in another array is met

You can get a lot more performance by writing your first version completely in numpy. Replace pythons sum with np.sum. Instead of the for i in positions list comprehension, simply pass the positions mask you are creating anyways.
Indeed, the np.where is not necessary and my best version looks like:

#my version 0
t0=time.time()
result = np.empty(len(z_distances))
for index_dist, val_dist in enumerate(z_distances):
    positions = pos_part[:, 2] < val_dist
    result[index_dist] = np.sum(prop_part[positions])
print("v0 :",time.time()-t0)
# out: v0 : 0.06322097778320312

You can get a bit faster if z_distances is very long by using numba.
Running calc for the first time usually creates some overhead which we can get rid of by running the function for some small set of `z_distances.
The below code achieves roughly a factor of two speedup over pure numpy on my laptop.

import numba as nb
@nb.njit(parallel=True)
def calc(result, z_distances):
    n = z_distances.shape[0]
    for ii in nb.prange(n):
        pos = pos_part[:, 2] < z_distances[ii]
        result[ii] = np.sum(prop_part[pos])
    return result

result4 = np.zeros_like(result)
# _t = time.time()
# calc(result4, z_distances[:10])
# print(time.time()-_t)
t3 = time.time()
result4 = calc(result4, z_distances)
print("v3 :", time.time()-t3)
plt.plot(z_distances, result4)

Summing data from array based on other array in Numpy

Here's a vectorized approach to get the counts for ID and ID-based summed values for value with a combination of np.unique and np.bincount -

unqID,idx,IDsums = np.unique(ID,return_counts=True,return_inverse=True)

value_sums = np.bincount(idx,value.ravel())

To get the final output as a dictionary, you can use loop-comprehension to gather the summed values, like so -

{i:(IDsums[itr],value_sums[itr]) for itr,i in enumerate(unqID)}

Sample run -

In [86]: ID
Out[86]: 
array([[1, 1, 1, 2, 2],
       [1, 1, 2, 2, 5],
       [1, 1, 2, 5, 5],
       [1, 2, 2, 5, 5],
       [2, 2, 5, 5, 5]])

In [87]: value
Out[87]: 
array([[ 14.8,  17. ,  74.3,  40.3,  90.2],
       [ 25.2,  75.9,   5.6,  40. ,  33.7],
       [ 78.9,  39.3,  11.3,  63.6,  56.7],
       [ 11.4,  75.7,  78.4,  88.7,  58.6],
       [ 79.6,  32.3,  35.3,  52.5,  13.3]])

In [88]: unqID,idx,IDsums = np.unique(ID,return_counts=True,return_inverse=True)
    ...: value_sums = np.bincount(idx,value.ravel())
    ...: 

In [89]: {i:(IDsums[itr],value_sums[itr]) for itr,i in enumerate(unqID)}
Out[89]: 
{1: (8, 336.80000000000001),
 2: (9, 453.40000000000003),
 5: (8, 402.40000000000003)}

Summing values of numpy array based on indices in other array

import numpy as np

N = 8
M = 4
b = np.array([0, 1, 2, 0, 2, 3, 1, 1])
c = np.array([ 0.15517108,  0.84717734,  0.86019899,  0.62413489,  0.24357903, 0.86015187,  0.85813481,  0.7071174 ])

a = ((np.mgrid[:M,:N] == b)[0] * c).sum(axis=1)

returns

array([ 0.77930597,  2.41242955,  1.10377802,  0.86015187])

Numpy sum elements in a multi-dimensional array according to indices

Ignoring the -1 complication for the moment, the straight forward indexing and summation is:

In [58]: arr = np.array([[ 1, 2, 3, 4, 5], [5, 6, 7, 8, 9]])
In [59]: idx = np.array([[[0, 1, 1], [2, 4, 6]],
    ...:                 [[5, 1, 3], [1, -1, -1]]])
In [60]: arr.flat[idx]
Out[60]: 
array([[[1, 2, 2],
        [3, 5, 6]],

       [[5, 2, 4],
        [2, 9, 9]]])
In [61]: _.sum(axis=-1)
Out[61]: 
array([[ 5, 14],
       [11, 20]])

One way (not necessarily fast or memory efficient) of dealing with the -1 is with a masked array:

In [62]: mask = idx<0
In [63]: mask
Out[63]: 
array([[[False, False, False],
        [False, False, False]],

       [[False, False, False],
        [False,  True,  True]]])

In [65]: ma = np.ma.masked_array(Out[60],mask)
In [67]: ma
Out[67]: 
masked_array(
  data=[[[1, 2, 2],
         [3, 5, 6]],

        [[5, 2, 4],
         [2, --, --]]],
  mask=[[[False, False, False],
         [False, False, False]],

        [[False, False, False],
         [False,  True,  True]]],
  fill_value=999999)
In [68]: ma.sum(axis=-1)
Out[68]: 
masked_array(
  data=[[5, 14],
        [11, 2]],
  mask=[[False, False],
        [False, False]],
  fill_value=999999)

Masked arrays deal with operations like the sum by replacing the masked values with something neutral, such as 0 for the case of sums.

(I may revisit this in the morning).

sum with matrix product

In [72]: np.einsum('ijk,ijk->ij',Out[60],~mask)
Out[72]: 
array([[ 5, 14],
       [11,  2]])

This is more direct, and faster, than the masked array approach.

You haven't elaborated on constructing the index_tensor so I won't try to compare it.

Another possibility is to pad the array with a 0, and adjust indexing:

In [83]: arr1 = np.hstack((0,arr.ravel()))
In [84]: arr1
Out[84]: array([0, 1, 2, 3, 4, 5, 5, 6, 7, 8, 9])
In [85]: arr1[idx+1]
Out[85]: 
array([[[1, 2, 2],
        [3, 5, 6]],

       [[5, 2, 4],
        [2, 0, 0]]])
In [86]: arr1[idx+1].sum(axis=-1)
Out[86]: 
array([[ 5, 14],
       [11,  2]])

sparse

A first stab at using sparse matrix:

Reshape idx to 2d:

In [141]: idx1 = np.reshape(idx,(4,3))

make a sparse tensor from that. For a start I'll go the iterative lil approach, though usually constructing coo (or even csr) inputs directly is faster:

In [142]: M = sparse.lil_matrix((4,10),dtype=int)
     ...: for i in range(4):
     ...:     for j in range(3):
     ...:         v = idx1[i,j]
     ...:         if v>=0:
     ...:            M[i,v] = 1
     ...: 
In [143]: M
Out[143]: 
<4x10 sparse matrix of type '<class 'numpy.int64'>'
    with 9 stored elements in List of Lists format>
In [144]: M.A
Out[144]: 
array([[1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 1, 0, 1, 0, 0, 0],
       [0, 1, 0, 1, 0, 1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]])

This can then be used for a sum of products:

In [145]: M@arr.ravel()
Out[145]: array([ 3, 14, 11,  2])

Using M.A@arr.ravel() is essentially what you do. While M is sparse, arr is not. For this case M.A@ is faster than M@.

Numpy Sum Elements in Array Based on Its Value