Bin Elements Per Row - Vectorized 2D Bincount for Numpy

Bin elements per row - Vectorized 2D Bincount for NumPy

Well that's basically what does np.bincount does with 1D arrays. But, we need to use it on each row iteratively (thinking of it simply). To make it vectorized, we could offset each row by that max number. The idea is to have different bins for each row such that they are not affected by other row elements with same numbers.

Hence, the implementation would be -

# Vectorized solution
def bincount2D_vectorized(a):
N = a.max()+1
a_offs = a + np.arange(a.shape[0])[:,None]*N
return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)

Sample run -

In [189]: a
Out[189]:
array([[1, 1, 0, 4],
[2, 4, 2, 1],
[1, 2, 3, 5],
[4, 4, 4, 1]])

In [190]: bincount2D_vectorized(a)
Out[190]:
array([[1, 2, 0, 0, 1, 0],
[0, 1, 2, 0, 1, 0],
[0, 1, 1, 1, 0, 1],
[0, 1, 0, 0, 3, 0]])

Numba Tweaks

We can bring in numba for further speedups. Now, numba allows few tweaks.

  • First off, it allows JIT compilation.

  • Also, recently they had introduced experimental parallel that automatically parallelizes operations in the function known to have parallel semantics.

  • Final tweak would be to use prange as a subsititute for range. The docs state that this runs loops in parallel, similar to OpenMP parallel for loops and Cython’s prange. prange performs well with larger datasets, which probably is because of the overhead needed to setup the parallel work.

So, with these new two tweaks along with the njit for no-Python mode, we would have three variants -

# Numba solutions
def bincount2D_numba(a, use_parallel=False, use_prange=False):
N = a.max()+1
m,n = a.shape
out = np.zeros((m,N),dtype=int)

# Choose fucntion based on args
func = bincount2D_numba_func0
if use_parallel:
if use_prange:
func = bincount2D_numba_func2
else:
func = bincount2D_numba_func1
# Run chosen function on input data and output
func(a, out, m, n)
return out

@njit
def bincount2D_numba_func0(a, out, m, n):
for i in range(m):
for j in range(n):
out[i,a[i,j]] += 1

@njit(parallel=True)
def bincount2D_numba_func1(a, out, m, n):
for i in range(m):
for j in range(n):
out[i,a[i,j]] += 1

@njit(parallel=True)
def bincount2D_numba_func2(a, out, m, n):
for i in prange(m):
for j in prange(n):
out[i,a[i,j]] += 1

For completeness and testing out later on, the loopy version would be -

# Loopy solution
def bincount2D_loopy(a):
N = a.max()+1
m,n = a.shape
out = np.zeros((m,N),dtype=int)
for i in range(m):
out[i] = np.bincount(a[i], minlength=N)
return out

Runtime test

Case #1 :

In [312]: a = np.random.randint(0,100,(100,100))

In [313]: %timeit bincount2D_loopy(a)
...: %timeit bincount2D_vectorized(a)
...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
10000 loops, best of 3: 115 µs per loop
10000 loops, best of 3: 36.7 µs per loop
10000 loops, best of 3: 22.6 µs per loop
10000 loops, best of 3: 22.7 µs per loop
10000 loops, best of 3: 39.9 µs per loop

Case #2 :

In [316]: a = np.random.randint(0,100,(1000,1000))

In [317]: %timeit bincount2D_loopy(a)
...: %timeit bincount2D_vectorized(a)
...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
100 loops, best of 3: 2.97 ms per loop
100 loops, best of 3: 3.54 ms per loop
1000 loops, best of 3: 1.83 ms per loop
100 loops, best of 3: 1.78 ms per loop
1000 loops, best of 3: 1.4 ms per loop

Case #3 :

In [318]: a = np.random.randint(0,1000,(1000,1000))

In [319]: %timeit bincount2D_loopy(a)
...: %timeit bincount2D_vectorized(a)
...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
100 loops, best of 3: 4.01 ms per loop
100 loops, best of 3: 4.86 ms per loop
100 loops, best of 3: 3.21 ms per loop
100 loops, best of 3: 3.18 ms per loop
100 loops, best of 3: 2.45 ms per loop

Seems like the numba variants are performing very well. Choosing one out of the three variants would depend on the input array shape parameters and to some extent on the number of unique elements in it.

weighted numpy bincount for 2D IDs array and 1D weights

Inspired by this post -

def bincount2D(id_ar_2D, weights_1D):
# Inputs : 2D id array, 1D weights array

# Extent of bins per col
n = id_ar_2D.max()+1

N = len(id_ar_2D)
id_ar_2D_offsetted = id_ar_2D + n*np.arange(N)[:,None]

# Finally use bincount with those 2D bins as flattened and with
# flattened b as weights. Reshaping is needed to add back into "a".
ids = id_ar_2D_offsetted.ravel()
W = np.tile(weights_1D,N)
return np.bincount(ids, W, minlength=n*N).reshape(-1,n)

out = bincount2D(index_tri, weights)

2D Vectorization of unique values per row with condition

np.unique sorts the array which makes it less efficient for your purpose. Use np.bincount and that way you most likely will save some time(depending on your array shape and values in the array). You also will not need np.any anymore:

def grpCountSize(arr, grpCount, grpSize):    
count = [np.bincount(row) for row in arr]
valid = [np.count_nonzero(row == grpSize) == grpCount for row in count]
return valid

Another way that might even save more time is using same number of bins for all rows and create one array:

def grpCountSize(arr, grpCount, grpSize):
m = arr.max()
count = np.stack([np.bincount(row, minlength=m+1) for row in arr])
return (count == grpSize).sum(1)==grpCount

Another yet upgrade is to use vectorized 2D bin count from this post. For example (note that Numba solutions tested in the post above is faster. I just provided the numpy solution for example. You can replace the function with any of the suggested ones in the post linked above):

def grpCountSize(arr, grpCount, grpSize):
count = bincount2D_vectorized(arr)
return (count == grpSize).sum(1)==grpCount
#from the post above
def bincount2D_vectorized(a):
N = a.max()+1
a_offs = a + np.arange(a.shape[0])[:,None]*N
return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)

output of all solutions above:

a[grpCountSize2(a, 1, 2)]
#array([[2, 2, 5, 6, 2, 5],
# [3, 3, 7, 7, 3, 3]])

Can numpy bincount work with 2D arrays?

The problem is that bincount isn't always returning the same shaped objects, in particular when values are missing. For example:

>>> m = np.array([[0,0,1],[1,1,0],[1,1,1]])
>>> np.apply_along_axis(np.bincount, 1, m)
array([[2, 1],
[1, 2],
[0, 3]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([2, 1]), array([1, 2]), array([0, 3])]

works, but:

>>> m = np.array([[0,0,0],[1,1,0],[1,1,0]])
>>> m
array([[0, 0, 0],
[1, 1, 0],
[1, 1, 0]])
>>> [np.bincount(m[i]) for i in range(m.shape[1])]
[array([3]), array([1, 2]), array([1, 2])]
>>> np.apply_along_axis(np.bincount, 1, m)
Traceback (most recent call last):
File "<ipython-input-49-72e06e26a718>", line 1, in <module>
np.apply_along_axis(np.bincount, 1, m)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/shape_base.py", line 117, in apply_along_axis
outarr[tuple(i.tolist())] = res
ValueError: could not broadcast input array from shape (2) into shape (1)

won't.

You could use the minlength parameter and pass it using a lambda or partial or something:

>>> np.apply_along_axis(lambda x: np.bincount(x, minlength=2), axis=1, arr=m)
array([[3, 0],
[1, 2],
[1, 2]])

np.bincount for 1 line, vectorized multidimensional averaging

One way is to convert the 3 "indices" to a linear index and then apply bincount. Numpy's ravel_multi_index is essentially the same as MATLAB's sub2ind. So the ported code could be something like:

shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)

ind_len = np.tile(Lvec[:,None], [1, Lmax])
ind_pos = np.tile(posvec, [n, 1])
ind_val = data
Y_copied = np.tile(Y[:,None], [1, Lmax])

mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((ind_len[mask], ind_pos[mask], ind_val[mask]), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied[mask], minlength=np.prod(shape)) / n
Y_avg.shape = shape

This is assuming data has shape (n, Lmax), Lvec is Numpy array, etc. You may need to adapt the code a little to get rid of off-by-one errors.

One could argue that the tile operations are not very efficient and not very "numpythonic". Something with broadcast_arrays could be nice, but I think I prefer this way:

shape = (Lmax+1, Lmax+1, maxvalue+1)
posvec = np.arange(1, Lmax+1)

len_idx = np.repeat(Lvec, Lvec)
pos_idx = np.broadcast_to(posvec, data.shape)[mask]
val_idx = data[mask]
Y_copied = np.repeat(Y, Lvec)

mask = posvec <= Lvec[:,None] # fill-value independent
lin_idx = np.ravel_multi_index((len_idx, pos_idx, val_idx), shape)
Y_avg = np.bincount(lin_idx, weights=Y_copied, minlength=np.prod(shape)) / n
Y_avg.shape = shape

Note broadcast_to was added in Numpy 1.10.0.

Numpy - find most common item per row

Supposing m is the name of your matrix:

most_f = np.array([np.bincount(row).argmax() for row in m])

I hope this solves your question

Splitting a number and assigning to elements in a row in a numpy array

Broadcasting again!

def split_digits(a):
N = int(np.log10(np.max(a))+1) # No. of digits
r = 10**np.arange(N,-1,-1) # 10-powered range array
return (np.asarray(a)[:,None]%r[:-1])//r[1:]

Sample runs -

In [224]: a = range(0,1001)

In [225]: split_digits(a)
Out[225]:
array([[0, 0, 0, 0],
[0, 0, 0, 1],
[0, 0, 0, 2],
...,
[0, 9, 9, 8],
[0, 9, 9, 9],
[1, 0, 0, 0]])

In [229]: a = np.random.randint(0,1000000,(7))

In [230]: a
Out[230]: array([431921, 871855, 636144, 541186, 410562, 89356, 476258])

In [231]: split_digits(a)
Out[231]:
array([[4, 3, 1, 9, 2, 1],
[8, 7, 1, 8, 5, 5],
[6, 3, 6, 1, 4, 4],
[5, 4, 1, 1, 8, 6],
[4, 1, 0, 5, 6, 2],
[0, 8, 9, 3, 5, 6],
[4, 7, 6, 2, 5, 8]])


Related Topics



Leave a reply



Submit