Increment Numpy array with repeated indices
After you do
bbins=np.bincount(b)
why not do:a[:len(bbins)] += bbins
(Edited for further simplification.) numpy.array.__iadd__ and repeated indices
for this numpy 1.8 added the at
reduction:
at(a, indices, b=None)
Performs unbuffered in place operation on operand 'a' for elements
specified by 'indices'. For addition ufunc, this method is equivalent
toa[indices] += b
, except that results are accumulated for elements
that are indexed more than once. For example,a[[0,0]] += 1
will
only increment the first element once because of buffering, whereas
add.at(a, [0,0], 1)
will increment the first element twice... versionadded:: 1.8.0
In [1]: A = np.array([0, 0, 0])
In [2]: B = np.array([1, 1, 1, 1, 1, 1])
In [3]: idx = [0, 0, 1, 1, 2, 2]
In [4]: np.add.at(A, idx, B)
In [5]: A
Out[5]: array([2, 2, 2])
Add 1 to numpy array from a list of indices?
What you are essentially trying to do, is to replace the indexes by their frequencies.
Try np.bincount
. Technically that does the same what you are trying to do.
indices = [1, 2, 3, 2, 1]
np.bincount(indices)
array([0, 2, 2, 1])
If you think about what you are doing. You are saying that for index 0, you dont want to count anything. but for index 1, you want 2 counts, .. and so on. Hope that gives you an intuitive sense of why this is the same.@Stef's solution with np.unique
, does exactly the same thing as what np.bincount
would do.
Increment Numpy multi-d array with repeated indices
How about this:
import numpy as np
a = np.zeros((2,3,4))
i = np.array([0,0,1])
j = np.array([0,0,1])
k = np.array([0,0,3])
ijk = np.vstack((i,j,k)).T
H,edge = np.histogramdd(ijk,bins=a.shape)
a += H
Update 2D NumPy array indexed with cross product with repeated indices
I don't think this is the best way to do it but here's one way.
import numpy as np
image = np.arange(9).reshape(3, 3)
s = np.zeros((5, 5))
x_idx, y_idx = np.meshgrid([0, 0, 2], [1, 1, 2])
# find unique destinations
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
# create mask for the source and sumthe source pixels headed to the same destination
idxs_repeated = idxs[None, :, :].repeat(len(idxs_unique), axis = 0)
image_mask = (idxs_repeated == idxs_unique[:, None, :]).all(-1)
pixel_sum = (image.flatten()[None, :]*image_mask).sum(-1)
# assign summed sources to destination
s[tuple(idxs_unique.T)] += pixel_sum
EDIT 1:If you run into problems caused by memory constraints you can do the image masking and summation in batches as done in the following implementation. I set the batch size to 10 but that parameter can be set to whatever works on your machine.
import numpy as np
image = np.arange(12).reshape(3, 4)
s = np.zeros((5, 5))
x_idx, y_idx = np.meshgrid([0, 0, 2], [1, 1, 2, 1])
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
batch_size = 10
pixel_sum = []
for i in range(len(unique_idxs)//batch_size + ((len(unique_idxs)%batch_size)!=0)):
batch = idxs_unique[i*batch_size:(i+1)*batch_size, None, :]
idxs_repeated = idxs[None, :, :].repeat(len(batch), axis = 0)
image_mask = (idxs_repeated == idxs_unique[i*batch_size:(i+1)*batch_size, None, :]).all(-1)
pixel_sum.append((image.flatten()[None, :]*image_mask).sum(-1))
pixel_sum = np.concatenate(pixel_sum)
s[tuple(idxs_unique.T)] += pixel_sum
EDIT 2:OP's method seems to be faster by far if you use numba.
import numpy as np
from numba import jit
@jit(nopython=True)
def get_s(image, grid_size):
W, H = image.shape
s = np.zeros((W, H))
for w in range(W):
for h in range(H):
i, j = int(w / grid_size), int(h / grid_size)
s[i, j] += image[w, h]
return s
def get_s_vec(image, grid_size, batch_size = 10):
W, H = image.shape
s = np.zeros((W, H))
w_idx, h_idx = np.arange(W), np.arange(H)
x_idx, y_idx = np.trunc(w_idx / grid_size).astype(int), np.trunc(h_idx / grid_size).astype(int)
y_idx, x_idx = np.meshgrid(y_idx, x_idx)
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
pixel_sum = []
for i in range(len(unique_idxs)//batch_size + ((len(unique_idxs)%batch_size)!=0)):
batch = idxs_unique[i*batch_size:(i+1)*batch_size, None, :]
idxs_repeated = idxs[None, :, :].repeat(len(batch), axis = 0)
image_mask = (idxs_repeated == idxs_unique[i*batch_size:(i+1)*batch_size, None, :]).all(-1)
pixel_sum.append((image.flatten()[None, :]*image_mask).sum(-1))
pixel_sum = np.concatenate(pixel_sum)
s[tuple(idxs_unique.T)] += pixel_sum
return s
print(f'loop result = {get_s(image, 2)}')
print(f'vector result = {get_s_vec(image, 2)}')
%timeit get_s(image, 2)
%timeit get_s_vec(image, 2)
output:loop result = [[10. 18. 0. 0.]
[17. 21. 0. 0.]
[ 0. 0. 0. 0.]]
vector result = [[10. 18. 0. 0.]
[17. 21. 0. 0.]
[ 0. 0. 0. 0.]]
The slowest run took 15.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 751 ns per loop
1000 loops, best of 5: 195 µs per loop
Numpy (sparse) repeated index increment
I don't know that it's documented
It's well known that dense arrays are set just once per unique index due to buffering. We have to use add.at
to get unbuffered addition.
In [966]: C=sparse.lil_matrix((3,2),dtype=int)
In [967]: Ca=C.A
In [968]: Ca += 1
In [969]: Ca
Out[969]:
array([[1, 1],
[1, 1],
[1, 1]])
In [970]: Ca=C.A
In [973]: np.add.at(Ca,(a,b),1)
In [974]: Ca
Out[974]:
array([[2, 2],
[2, 2],
[2, 2]])
Your example shows that the lil
indexed setting behaves in the buffered sense as well. But I'd have to look at the code to see exactly why.It is documented that coo
style inputs are summed across duplicates.
In [975]: M=sparse.coo_matrix((np.ones_like(a),(a,b)), shape=(3,2))
In [976]: print(M)
(0, 0) 1
(1, 1) 1
(2, 0) 1
(0, 1) 1
(1, 0) 1
(2, 1) 1
(0, 0) 1
(1, 1) 1
(2, 0) 1
(0, 1) 1
(1, 0) 1
(2, 1) 1
In [977]: M.A
Out[977]:
array([[2, 2],
[2, 2],
[2, 2]])
In [978]: M
Out[978]:
<3x2 sparse matrix of type '<class 'numpy.int32'>'
with 12 stored elements in COOrdinate format>
In [979]: M.tocsr()
Out[979]:
<3x2 sparse matrix of type '<class 'numpy.int32'>'
with 6 stored elements in Compressed Sparse Row format>
In [980]: M.sum_duplicates()
In [981]: M
Out[981]:
<3x2 sparse matrix of type '<class 'numpy.int32'>'
with 6 stored elements in COOrdinate format>
points are stored in coo
format as entered, but when used for display or calculation (csr format) duplicates are summed. Incrementing a numpy 3-D array with repeating 2-D positions
Two approaches with np.add.at
and np.bincount
could be proposed -
def addtoarray_addat(voxel_space, vol_coords, z_index=0):
shp = voxel_space.shape
idx = vol_coords[:,0]*shp[2] + vol_coords[:,1] + z_index*shp[2]*shp[1]
np.add.at(voxel_space.ravel(),idx,1)
return voxel_space
def addtoarray_bincount(voxel_space, vol_coords, z_index=0):
shp = voxel_space.shape
idx = vol_coords[:,0]*shp[2] + vol_coords[:,1] + z_index*shp[2]*shp[1]
voxel_space += np.bincount(idx, minlength=np.prod(shp)).reshape(shp)
return voxel_space
If we are filling into a zeros initialized array, it would be simpler as we feed in the output shape instead -def addtoarray_bincount_zerosinit(shp, vol_coords, z_index=0):
# shp is shape of voxel_space, desired output
idx = vol_coords[:,0]*shp[2] + vol_coords[:,1] + z_index*shp[2]*shp[1]
voxel_space = np.bincount(idx, minlength=np.prod(shp)).reshape(shp)
return voxel_space
Related Topics
Running Jupyter with Multiple Python and Ipython Paths
Listing Contents of a Bucket with Boto3
How to Reinstall Python@2 from Homebrew
Does 'Anaconda' Create a Separate Pythonpath Variable for Each New Environment
How to Know/Change Current Directory in Python Shell
Nan Loss When Training Regression Network
Numpy Array Initialization (Fill with Identical Values)
Scale Everything on Pygame Display Surface
Pycharm Error: 'No Module' When Trying to Import Own Module (Python Script)
Return List of Items in List Greater Than Some Value
How to Get a List of Keywords in Python
Multiprocessing:Use Tqdm to Display a Progress Bar
Difference Between Two Lists with Duplicates in Python
Make Part of a Matplotlib Title Bold and a Different Color
Complete Scan of Dynamodb with Boto3
Is It Better to Use "Is" or "==" for Number Comparison in Python
How to Get Millisecond and Microsecond-Resolution Timestamps in Python