How to Find First Non-Zero Value in Every Column of a Numpy Array

How to find first non-zero value in every column of a numpy array?

Indices of first occurrences

Use np.argmax along that axis (zeroth axis for columns here) on the mask of non-zeros to get the indices of first matches (True values) -

(arr!=0).argmax(axis=0)

Extending to cover generic axis specifier and for cases where no non-zeros are found along that axis for an element, we would have an implementation like so -

def first_nonzero(arr, axis, invalid_val=-1):
    mask = arr!=0
    return np.where(mask.any(axis=axis), mask.argmax(axis=axis), invalid_val)

Note that since argmax() on all False values returns 0, so if the invalid_val needed is 0, we would have the final output directly with mask.argmax(axis=axis).

Sample runs -

In [296]: arr    # Different from given sample for variety
Out[296]: 
array([[1, 0, 0],
       [1, 1, 0],
       [0, 1, 0],
       [0, 0, 0]])

In [297]: first_nonzero(arr, axis=0, invalid_val=-1)
Out[297]: array([ 0,  1, -1])

In [298]: first_nonzero(arr, axis=1, invalid_val=-1)
Out[298]: array([ 0,  0,  1, -1])

Extending to cover all comparison operations

To find the first zeros, simply use arr==0 as mask for use in the function. For first ones equal to a certain value val, use arr == val and so on for all cases of comparisons possible here.

Indices of last occurrences

To find the last ones matching a certain comparison criteria, we need to flip along that axis and use the same idea of using argmax and then compensate for the flipping by offsetting from the axis length, as shown below -

def last_nonzero(arr, axis, invalid_val=-1):
    mask = arr!=0
    val = arr.shape[axis] - np.flip(mask, axis=axis).argmax(axis=axis) - 1
    return np.where(mask.any(axis=axis), val, invalid_val)

Sample runs -

In [320]: arr
Out[320]: 
array([[1, 0, 0],
       [1, 1, 0],
       [0, 1, 0],
       [0, 0, 0]])

In [321]: last_nonzero(arr, axis=0, invalid_val=-1)
Out[321]: array([ 1,  2, -1])

In [322]: last_nonzero(arr, axis=1, invalid_val=-1)
Out[322]: array([ 0,  1,  1, -1])

Again, all cases of comparisons possible here are covered by using the corresponding comparator to get mask and then using within the listed function.

Find the index of first non-zero element to the right of given elements in python

If your query array is sufficiently dense, you can reverse the computation: find an array of the same size as matrix that gives the index of the next nonzero element in the same row for each location. Then your problem becomes one of just one of applying query to this index array, which numpy supports directly.

It is actually much easier to find the left index, so let's start with that. We can transform matrix into an array of indices like this:

r, c = np.nonzero(matrix)
left_ind = np.zeros(matrix.shape, dtype=int)
left_ind[r, c] = c

Now you can find the indices of the preceding nonzero element by using np.maximum similarly to how it is done in this answer: https://stackoverflow.com/a/48252024/2988730:

np.maximum.accumulate(left_ind, axis=1, out=left_ind)

Now you can index directly into ind to get the previous nonzero column index:

left_ind[query[:, 0], query[:, 1]]

left_ind[tuple(query.T)]

Now to do the same thing with the right index, you need to reverse the array. But then your indices are no longer ascending, and you risk overwriting any zeros you have in the first column. To solve that, in addition to just reversing the array, you need to reverse the order of the indices:

right_ind = np.zeros(matrix.shape, dtype=int)
right_ind[r, c] = matrix.shape[1] - c

You can use any number larger than matrix.shape[1] as your constant as well. The important thing is that the reversed indices all come out greater than zero so np.maximum.accumulate overwrites the zeros. Now you can use np.maximum.accumulate in the same way on the reversed array:

right_ind = matrix.shape[1] - np.maximum.accumulate(right_ind[:, ::-1], axis=1)[:, ::-1]

In this case, I would recommend against using out=right_ind, since right_ind[:, ::-1] is a view into the same buffer. The operation is buffered, but if your line size is big enough, you may overwrite data unintentionally.

Now you can index the array in the same way as before:

right_ind[(*query.T,)]

In both cases, you need to stack with the first column of query, since that's the row key:

>>> row, col = query.T
>>> np.stack((row, left_ind[row, col]), -1)
array([[0, 0],
       [2, 0],
       [1, 1],
       [0, 0]])
>>> np.stack((row, right_ind[row, col]), -1)
array([[0, 3],
       [2, 4],
       [1, 4],
       [0, 3]])
>>> np.stack((row, left_ind[row, col], right_ind[row, col]), -1)
array([[0, 0, 3],
       [2, 0, 4],
       [1, 1, 4],
       [0, 0, 3]])

If you plan on sampling most of the rows in the array, either at once, or throughout your program, this will help you speed things up. If, on the other hand, you only need to access a small subset, you can apply this technique only to the rows you need.

Finding first non-zero value along axis of a sorted two dimensional numpy array

It is reasonably fast to use np.where:

>>> a
array([[0, 0, 0, 1, 1, 1, 1],
       [0, 0, 0, 1, 1, 1, 1],
       [0, 0, 0, 0, 1, 1, 1],
       [0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0]])
>>> np.where(a>0)
(array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 4, 5]), array([3, 4, 5, 6, 3, 4, 5, 6, 4, 5, 6, 6, 6, 6]))

That delivers tuples with to coordinates of the values greater than 0.

You can also use np.where to test each sub array:

def first_true1(a):
    """ return a dict of row: index with value in row > 0 """
    di={}
    for i in range(len(a)):
        idx=np.where(a[i]>0)
        try:
            di[i]=idx[0][0]
        except IndexError:
            di[i]=None    

    return di

Prints:

{0: 3, 1: 3, 2: 4, 3: 6, 4: 6, 5: 6, 6: None}

ie, row 0: index 3>0; row 4: index 4>0; row 6: no index greater than 0

As you suspect, argmax may be faster:

def first_true2():
    di={}
    for i in range(len(a)):
        idx=np.argmax(a[i])
        if idx>0:
            di[i]=idx
        else:
            di[i]=None    

    return di       
    # same dict is returned...

If you can deal with the logic of not having a None for rows of all naughts, this is faster still:

def first_true3():
    di={}
    for i, j in zip(*np.where(a>0)):
        if i in di:
            continue
        else:
            di[i]=j

    return di

And here is a version that uses axis in argmax (as suggested in your comments):

def first_true4():
    di={}
    for i, ele in enumerate(np.argmax(a,axis=1)):
        if ele==0 and a[i][0]==0:
            di[i]=None
        else:
            di[i]=ele

    return di

For speed comparisons (on your example array), I get this:

            rate/sec usec/pass first_true1 first_true2 first_true3 first_true4
first_true1   23,818    41.986          --      -34.5%      -63.1%      -70.0%
first_true2   36,377    27.490       52.7%          --      -43.6%      -54.1%
first_true3   64,528    15.497      170.9%       77.4%          --      -18.6%
first_true4   79,287    12.612      232.9%      118.0%       22.9%          --

If I scale that to a 2000 X 2000 np array, here is what I get:

            rate/sec  usec/pass first_true3 first_true1 first_true2 first_true4
first_true3        3 354380.107          --       -0.3%      -74.7%      -87.8%
first_true1        3 353327.084        0.3%          --      -74.6%      -87.7%
first_true2       11  89754.200      294.8%      293.7%          --      -51.7%
first_true4       23  43306.494      718.3%      715.9%      107.3%          --

The first non-zero element in each row

If I understood correctly, the break should be inside the if statement

if a[i,j]!=0:
      print(i+1,'-th row,',j+1,'-th column','\nthe 1st non-zero element:',a[i,j],'\n---')
      break

How to extract non-zero values of a numpy array

Let's try:

mask = arr != 0

# mask the 0 with infinity and sort
new_arr = np.sort(np.where(mask, arr, np.inf), axis=0)

# replace back:
arr[:] = np.where(mask, new_arr[::-1], 0)

# extract the result
result = arr[np.arange(arr.shape[0]),mask.argmax(axis=1)]

Identify the first & last non-zero elements/indices within a group in numpy

group = np.array([0,0,0,0,1,1,1,1,1,1,2,2,2,2])  
array = np.array([1,2,3,0,0,2,0,3,4,0,0,0,0,1])  
targt = np.array([1,1,1,0,0,2,2,2,2,0,0,0,0,1])

You can do the following steps:

STEP 1. Find indices of nonzero items of array and mark the startings of new groups

nonzero_idx -> [*0,1,2,/,*/,5,/,7,8,/,*/,/,/,13] (cross out slashes)
marker_idx -> [0, 4, 10]

STEP 2. Find starting and ending indices for each group, use np.ufunc.reduceat
```
starts -> [ 0,  5, 13]
ends -> [ 2,  8, 13]
```
STEP 3. Think of an out array such that np.cumsum(out) collapses into target array. Like so:
```
[1,0,0,-1,0,2,0,0,0,-2,0,0,0,1] -> [1,1,1,0,0,2,2,2,2,0,0,0,0,1]
```

Now, code:

#STEP 1
nonzero = (array != 0)
_, marker_idx = np.unique(group[nonzero], return_index=True)
nonzero_idx = np.arange(len(array))[nonzero]
#STEP 2
starts = np.minimum.reduceat(nonzero_idx, marker_idx)
ends = np.maximum.reduceat(nonzero_idx, marker_idx)
#STEP 3
values = array[starts]
out = np.zeros_like(array)
out[starts] = values
#check the case we can't insert the last negative value
if ends[-1]+1==len(array): 
    out[ends[:-1]+1] = -values[:-1]
else:
    out[ends+1] = -values
>>> np.cumsum(out)
array([1, 1, 1, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 1], dtype=int32)

No loops needed!

How to Find First Non-Zero Value in Every Column of a Numpy Array