How to find first non-zero value in every column of a numpy array?
Indices of first occurrences
Use np.argmax
along that axis (zeroth axis for columns here) on the mask of non-zeros to get the indices of first matches
(True values) -
(arr!=0).argmax(axis=0)
Extending to cover generic axis specifier and for cases where no non-zeros are found along that axis for an element, we would have an implementation like so -
def first_nonzero(arr, axis, invalid_val=-1):
mask = arr!=0
return np.where(mask.any(axis=axis), mask.argmax(axis=axis), invalid_val)
Note that since argmax()
on all False
values returns 0
, so if the invalid_val
needed is 0
, we would have the final output directly with mask.argmax(axis=axis)
.
Sample runs -
In [296]: arr # Different from given sample for variety
Out[296]:
array([[1, 0, 0],
[1, 1, 0],
[0, 1, 0],
[0, 0, 0]])
In [297]: first_nonzero(arr, axis=0, invalid_val=-1)
Out[297]: array([ 0, 1, -1])
In [298]: first_nonzero(arr, axis=1, invalid_val=-1)
Out[298]: array([ 0, 0, 1, -1])
Extending to cover all comparison operations
To find the first zeros
, simply use arr==0
as mask
for use in the function. For first ones equal to a certain value val
, use arr == val
and so on for all cases of comparisons
possible here.
Indices of last occurrences
To find the last ones matching a certain comparison criteria, we need to flip along that axis and use the same idea of using argmax
and then compensate for the flipping by offsetting from the axis length, as shown below -
def last_nonzero(arr, axis, invalid_val=-1):
mask = arr!=0
val = arr.shape[axis] - np.flip(mask, axis=axis).argmax(axis=axis) - 1
return np.where(mask.any(axis=axis), val, invalid_val)
Sample runs -
In [320]: arr
Out[320]:
array([[1, 0, 0],
[1, 1, 0],
[0, 1, 0],
[0, 0, 0]])
In [321]: last_nonzero(arr, axis=0, invalid_val=-1)
Out[321]: array([ 1, 2, -1])
In [322]: last_nonzero(arr, axis=1, invalid_val=-1)
Out[322]: array([ 0, 1, 1, -1])
Again, all cases of comparisons
possible here are covered by using the corresponding comparator to get mask
and then using within the listed function.
Find the index of first non-zero element to the right of given elements in python
If your query array is sufficiently dense, you can reverse the computation: find an array of the same size as matrix
that gives the index of the next nonzero element in the same row for each location. Then your problem becomes one of just one of applying query
to this index array, which numpy supports directly.
It is actually much easier to find the left index, so let's start with that. We can transform matrix
into an array of indices like this:
r, c = np.nonzero(matrix)
left_ind = np.zeros(matrix.shape, dtype=int)
left_ind[r, c] = c
Now you can find the indices of the preceding nonzero element by using np.maximum
similarly to how it is done in this answer: https://stackoverflow.com/a/48252024/2988730:
np.maximum.accumulate(left_ind, axis=1, out=left_ind)
Now you can index directly into ind
to get the previous nonzero column index:
left_ind[query[:, 0], query[:, 1]]
or
left_ind[tuple(query.T)]
Now to do the same thing with the right index, you need to reverse the array. But then your indices are no longer ascending, and you risk overwriting any zeros you have in the first column. To solve that, in addition to just reversing the array, you need to reverse the order of the indices:
right_ind = np.zeros(matrix.shape, dtype=int)
right_ind[r, c] = matrix.shape[1] - c
You can use any number larger than matrix.shape[1]
as your constant as well. The important thing is that the reversed indices all come out greater than zero so np.maximum.accumulate
overwrites the zeros. Now you can use np.maximum.accumulate
in the same way on the reversed array:
right_ind = matrix.shape[1] - np.maximum.accumulate(right_ind[:, ::-1], axis=1)[:, ::-1]
In this case, I would recommend against using out=right_ind
, since right_ind[:, ::-1]
is a view into the same buffer. The operation is buffered, but if your line size is big enough, you may overwrite data unintentionally.
Now you can index the array in the same way as before:
right_ind[(*query.T,)]
In both cases, you need to stack with the first column of query
, since that's the row key:
>>> row, col = query.T
>>> np.stack((row, left_ind[row, col]), -1)
array([[0, 0],
[2, 0],
[1, 1],
[0, 0]])
>>> np.stack((row, right_ind[row, col]), -1)
array([[0, 3],
[2, 4],
[1, 4],
[0, 3]])
>>> np.stack((row, left_ind[row, col], right_ind[row, col]), -1)
array([[0, 0, 3],
[2, 0, 4],
[1, 1, 4],
[0, 0, 3]])
If you plan on sampling most of the rows in the array, either at once, or throughout your program, this will help you speed things up. If, on the other hand, you only need to access a small subset, you can apply this technique only to the rows you need.
Finding first non-zero value along axis of a sorted two dimensional numpy array
It is reasonably fast to use np.where:
>>> a
array([[0, 0, 0, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1, 1],
[0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0]])
>>> np.where(a>0)
(array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 4, 5]), array([3, 4, 5, 6, 3, 4, 5, 6, 4, 5, 6, 6, 6, 6]))
That delivers tuples with to coordinates of the values greater than 0.
You can also use np.where to test each sub array:
def first_true1(a):
""" return a dict of row: index with value in row > 0 """
di={}
for i in range(len(a)):
idx=np.where(a[i]>0)
try:
di[i]=idx[0][0]
except IndexError:
di[i]=None
return di
Prints:
{0: 3, 1: 3, 2: 4, 3: 6, 4: 6, 5: 6, 6: None}
ie, row 0: index 3>0; row 4: index 4>0; row 6: no index greater than 0
As you suspect, argmax may be faster:
def first_true2():
di={}
for i in range(len(a)):
idx=np.argmax(a[i])
if idx>0:
di[i]=idx
else:
di[i]=None
return di
# same dict is returned...
If you can deal with the logic of not having a None
for rows of all naughts, this is faster still:
def first_true3():
di={}
for i, j in zip(*np.where(a>0)):
if i in di:
continue
else:
di[i]=j
return di
And here is a version that uses axis in argmax (as suggested in your comments):
def first_true4():
di={}
for i, ele in enumerate(np.argmax(a,axis=1)):
if ele==0 and a[i][0]==0:
di[i]=None
else:
di[i]=ele
return di
For speed comparisons (on your example array), I get this:
rate/sec usec/pass first_true1 first_true2 first_true3 first_true4
first_true1 23,818 41.986 -- -34.5% -63.1% -70.0%
first_true2 36,377 27.490 52.7% -- -43.6% -54.1%
first_true3 64,528 15.497 170.9% 77.4% -- -18.6%
first_true4 79,287 12.612 232.9% 118.0% 22.9% --
If I scale that to a 2000 X 2000 np array, here is what I get:
rate/sec usec/pass first_true3 first_true1 first_true2 first_true4
first_true3 3 354380.107 -- -0.3% -74.7% -87.8%
first_true1 3 353327.084 0.3% -- -74.6% -87.7%
first_true2 11 89754.200 294.8% 293.7% -- -51.7%
first_true4 23 43306.494 718.3% 715.9% 107.3% --
The first non-zero element in each row
If I understood correctly, the break should be inside the if statement
if a[i,j]!=0:
print(i+1,'-th row,',j+1,'-th column','\nthe 1st non-zero element:',a[i,j],'\n---')
break
How to extract non-zero values of a numpy array
Let's try:
mask = arr != 0
# mask the 0 with infinity and sort
new_arr = np.sort(np.where(mask, arr, np.inf), axis=0)
# replace back:
arr[:] = np.where(mask, new_arr[::-1], 0)
# extract the result
result = arr[np.arange(arr.shape[0]),mask.argmax(axis=1)]
Identify the first & last non-zero elements/indices within a group in numpy
group = np.array([0,0,0,0,1,1,1,1,1,1,2,2,2,2])
array = np.array([1,2,3,0,0,2,0,3,4,0,0,0,0,1])
targt = np.array([1,1,1,0,0,2,2,2,2,0,0,0,0,1])
You can do the following steps:
STEP 1. Find indices of nonzero items of
array
and mark the startings of new groupsnonzero_idx -> [*0,1,2,/,*/,5,/,7,8,/,*/,/,/,13] (cross out slashes)
marker_idx -> [0, 4, 10]STEP 2. Find starting and ending indices for each group, use
np.ufunc.reduceat
starts -> [ 0, 5, 13]
ends -> [ 2, 8, 13]STEP 3. Think of an
out
array such thatnp.cumsum(out)
collapses intotarget
array. Like so:[1,0,0,-1,0,2,0,0,0,-2,0,0,0,1] -> [1,1,1,0,0,2,2,2,2,0,0,0,0,1]
Now, code:
#STEP 1
nonzero = (array != 0)
_, marker_idx = np.unique(group[nonzero], return_index=True)
nonzero_idx = np.arange(len(array))[nonzero]
#STEP 2
starts = np.minimum.reduceat(nonzero_idx, marker_idx)
ends = np.maximum.reduceat(nonzero_idx, marker_idx)
#STEP 3
values = array[starts]
out = np.zeros_like(array)
out[starts] = values
#check the case we can't insert the last negative value
if ends[-1]+1==len(array):
out[ends[:-1]+1] = -values[:-1]
else:
out[ends+1] = -values
>>> np.cumsum(out)
array([1, 1, 1, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 1], dtype=int32)
No loops needed!
Related Topics
Is It Pythonic to Import Inside Functions
How to Call Function That Takes an Argument in a Django Template
Shipping Python Modules in Pyspark to Other Nodes
Datetime Timezone Conversion Using Pytz
Web Scraping Program Cannot Find Element Which I Can See in the Browser
Reduce Left and Right Margins in Matplotlib Plot
Access Memory Address in Python
Python Pandas: Group Datetime Column into Hour and Minute Aggregations
Differencebetween 'Transform' and 'Fit_Transform' in Sklearn
Convert Timedelta64[Ns] Column to Seconds in Python Pandas Dataframe
Python 2.X - Write Binary Output to Stdout
Python Regular Expression Re.Match, Why This Code Does Not Work
How to Use Win32 API with Python
How to Match Any String from a List of Strings in Regular Expressions in Python
Matplotlib: Plotting Numerous Disconnected Line Segments with Different Colors