Increment Numpy Array with Repeated Indices

Increment Numpy array with repeated indices

After you do

bbins=np.bincount(b)

why not do:

a[:len(bbins)] += bbins

(Edited for further simplification.)

numpy.array.iadd and repeated indices

for this numpy 1.8 added the at reduction:

at(a, indices, b=None)

Performs unbuffered in place operation on operand 'a' for elements
specified by 'indices'. For addition ufunc, this method is equivalent
to a[indices] += b, except that results are accumulated for elements
that are indexed more than once. For example, a[[0,0]] += 1 will
only increment the first element once because of buffering, whereas
add.at(a, [0,0], 1) will increment the first element twice.

.. versionadded:: 1.8.0

In [1]: A = np.array([0, 0, 0])
In [2]: B = np.array([1, 1, 1, 1, 1, 1])
In [3]: idx = [0, 0, 1, 1, 2, 2]
In [4]: np.add.at(A, idx, B)
In [5]: A
Out[5]: array([2, 2, 2])

Add 1 to numpy array from a list of indices?

What you are essentially trying to do, is to replace the indexes by their frequencies.

Try np.bincount. Technically that does the same what you are trying to do.

indices = [1, 2, 3, 2, 1]

np.bincount(indices)

array([0, 2, 2, 1])

If you think about what you are doing. You are saying that for index 0, you dont want to count anything. but for index 1, you want 2 counts, .. and so on. Hope that gives you an intuitive sense of why this is the same.

@Stef's solution with np.unique, does exactly the same thing as what np.bincount would do.

Increment Numpy multi-d array with repeated indices

How about this:

import numpy as np
a = np.zeros((2,3,4))
i = np.array([0,0,1])
j = np.array([0,0,1])
k = np.array([0,0,3])

ijk = np.vstack((i,j,k)).T
H,edge = np.histogramdd(ijk,bins=a.shape)
a += H

Update 2D NumPy array indexed with cross product with repeated indices

I don't think this is the best way to do it but here's one way.

import numpy as np

image = np.arange(9).reshape(3, 3)
s = np.zeros((5, 5))
x_idx, y_idx = np.meshgrid([0, 0, 2], [1, 1, 2])
# find unique destinations
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
# create mask for the source and sumthe source pixels headed to the same destination
idxs_repeated = idxs[None, :, :].repeat(len(idxs_unique), axis = 0)
image_mask = (idxs_repeated == idxs_unique[:, None, :]).all(-1)
pixel_sum = (image.flatten()[None, :]*image_mask).sum(-1)
# assign summed sources to destination
s[tuple(idxs_unique.T)] += pixel_sum

EDIT 1:

If you run into problems caused by memory constraints you can do the image masking and summation in batches as done in the following implementation. I set the batch size to 10 but that parameter can be set to whatever works on your machine.

import numpy as np

image = np.arange(12).reshape(3, 4)
s = np.zeros((5, 5))
x_idx, y_idx = np.meshgrid([0, 0, 2], [1, 1, 2, 1])
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
batch_size = 10
pixel_sum = []
for i in range(len(unique_idxs)//batch_size + ((len(unique_idxs)%batch_size)!=0)):
    batch = idxs_unique[i*batch_size:(i+1)*batch_size, None, :]
    idxs_repeated = idxs[None, :, :].repeat(len(batch), axis = 0)
    image_mask = (idxs_repeated == idxs_unique[i*batch_size:(i+1)*batch_size, None, :]).all(-1)
    pixel_sum.append((image.flatten()[None, :]*image_mask).sum(-1))
pixel_sum = np.concatenate(pixel_sum)
s[tuple(idxs_unique.T)] += pixel_sum

EDIT 2:
OP's method seems to be faster by far if you use numba.

import numpy as np
from numba import jit

@jit(nopython=True)
def get_s(image, grid_size):
    W, H = image.shape
    s = np.zeros((W, H))
    for w in range(W):
        for h in range(H):
            i, j = int(w / grid_size), int(h / grid_size)
            s[i, j] += image[w, h]
    return s

def get_s_vec(image, grid_size, batch_size = 10):
    W, H = image.shape
    s = np.zeros((W, H))
    w_idx, h_idx = np.arange(W), np.arange(H)
    x_idx, y_idx = np.trunc(w_idx / grid_size).astype(int), np.trunc(h_idx / grid_size).astype(int)
    y_idx, x_idx = np.meshgrid(y_idx, x_idx)
    idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
    idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
    pixel_sum = []
    for i in range(len(unique_idxs)//batch_size + ((len(unique_idxs)%batch_size)!=0)):
        batch = idxs_unique[i*batch_size:(i+1)*batch_size, None, :]
        idxs_repeated = idxs[None, :, :].repeat(len(batch), axis = 0)
        image_mask = (idxs_repeated == idxs_unique[i*batch_size:(i+1)*batch_size, None, :]).all(-1)
        pixel_sum.append((image.flatten()[None, :]*image_mask).sum(-1))
    pixel_sum = np.concatenate(pixel_sum)
    s[tuple(idxs_unique.T)] += pixel_sum
    return s

print(f'loop result = {get_s(image, 2)}')
print(f'vector result = {get_s_vec(image, 2)}')

%timeit get_s(image, 2)
%timeit get_s_vec(image, 2)

output:

loop result = [[10. 18.  0.  0.]
 [17. 21.  0.  0.]
 [ 0.  0.  0.  0.]]
vector result = [[10. 18.  0.  0.]
 [17. 21.  0.  0.]
 [ 0.  0.  0.  0.]]
The slowest run took 15.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 751 ns per loop
1000 loops, best of 5: 195 µs per loop

Numpy (sparse) repeated index increment

I don't know that it's documented

It's well known that dense arrays are set just once per unique index due to buffering. We have to use add.at to get unbuffered addition.

In [966]: C=sparse.lil_matrix((3,2),dtype=int)
In [967]: Ca=C.A
In [968]: Ca += 1
In [969]: Ca
Out[969]: 
array([[1, 1],
       [1, 1],
       [1, 1]])

In [970]: Ca=C.A
In [973]: np.add.at(Ca,(a,b),1)
In [974]: Ca
Out[974]: 
array([[2, 2],
       [2, 2],
       [2, 2]])

Your example shows that the lil indexed setting behaves in the buffered sense as well. But I'd have to look at the code to see exactly why.

It is documented that coo style inputs are summed across duplicates.

In [975]: M=sparse.coo_matrix((np.ones_like(a),(a,b)), shape=(3,2))
In [976]: print(M)
  (0, 0)    1
  (1, 1)    1
  (2, 0)    1
  (0, 1)    1
  (1, 0)    1
  (2, 1)    1
  (0, 0)    1
  (1, 1)    1
  (2, 0)    1
  (0, 1)    1
  (1, 0)    1
  (2, 1)    1
In [977]: M.A
Out[977]: 
array([[2, 2],
       [2, 2],
       [2, 2]])
In [978]: M
Out[978]: 
<3x2 sparse matrix of type '<class 'numpy.int32'>'
    with 12 stored elements in COOrdinate format>
In [979]: M.tocsr()
Out[979]: 
<3x2 sparse matrix of type '<class 'numpy.int32'>'
    with 6 stored elements in Compressed Sparse Row format>
In [980]: M.sum_duplicates()
In [981]: M
Out[981]: 
<3x2 sparse matrix of type '<class 'numpy.int32'>'
    with 6 stored elements in COOrdinate format>

points are stored in coo format as entered, but when used for display or calculation (csr format) duplicates are summed.

Incrementing a numpy 3-D array with repeating 2-D positions

Two approaches with np.add.at and np.bincount could be proposed -

def addtoarray_addat(voxel_space, vol_coords, z_index=0):
    shp = voxel_space.shape 
    idx = vol_coords[:,0]*shp[2] + vol_coords[:,1] + z_index*shp[2]*shp[1]
    np.add.at(voxel_space.ravel(),idx,1)
    return voxel_space

def addtoarray_bincount(voxel_space, vol_coords, z_index=0):
    shp = voxel_space.shape
    idx = vol_coords[:,0]*shp[2] + vol_coords[:,1] + z_index*shp[2]*shp[1]
    voxel_space += np.bincount(idx, minlength=np.prod(shp)).reshape(shp)
    return voxel_space

If we are filling into a zeros initialized array, it would be simpler as we feed in the output shape instead -

def addtoarray_bincount_zerosinit(shp, vol_coords, z_index=0):
    # shp is shape of voxel_space, desired output
    idx = vol_coords[:,0]*shp[2] + vol_coords[:,1] + z_index*shp[2]*shp[1]
    voxel_space = np.bincount(idx, minlength=np.prod(shp)).reshape(shp)
    return voxel_space

Increment Numpy Array with Repeated Indices