Finding the Consecutive Zeros in a Numpy Array

Finding the consecutive zeros in a numpy array

Here's a fairly compact vectorized implementation. I've changed the requirements a bit, so the return value is a bit more "numpythonic": it creates an array with shape (m, 2), where m is the number of "runs" of zeros. The first column is the index of the first 0 in each run, and the second is the index of the first nonzero element after the run. (This indexing pattern matches, for example, how slicing works and how the range function works.)

import numpy as np

def zero_runs(a):
# Create an array that is 1 where a is 0, and pad each end with an extra 0.
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges

For example:

In [236]: a = [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7, 0, 10, 11]

In [237]: runs = zero_runs(a)

In [238]: runs
Out[238]:
array([[ 3, 9],
[12, 16],
[19, 20]])

With this format, it is simple to get the number of zeros in each run:

In [239]: runs[:,1] - runs[:,0]
Out[239]: array([6, 4, 1])

It's always a good idea to check the edge cases:

In [240]: zero_runs([0,1,2])
Out[240]: array([[0, 1]])

In [241]: zero_runs([1,2,0])
Out[241]: array([[2, 3]])

In [242]: zero_runs([1,2,3])
Out[242]: array([], shape=(0, 2), dtype=int64)

In [243]: zero_runs([0,0,0])
Out[243]: array([[0, 3]])

Find longest consecutive zeros in 3D-array along axis

To perform your task, define the following function:

def longestZeroSeqLength(a):
# Changes in "isZero" for consecutive elements
chg = np.abs(np.diff(np.equal(a, 0).view(np.int8), prepend=[0], append=[0]))
# Ranges of "isZero" elements
rng = np.where(chg == 1)[0]
if rng.size == 0: return 0 # All non-zero elements
rng = rng.reshape(-1, 2)
# Compute length of each range and return the biggest
return np.subtract(rng[:,1], rng[:,0]).max()

Then apply it to your array:

result = np.apply_along_axis(longestZeroSeqLength, 0, a)

To test it, I created the following (smaller) array:

siz = (3, 4, 5)
np.random.seed(1)
a = np.random.randint(2, size=siz)

After running my code I got:

array([[1, 0, 2, 2, 0],
[1, 2, 1, 0, 3],
[3, 1, 1, 1, 0],
[1, 1, 2, 2, 3]], dtype=int64)

To easier assess what contains each slice and what is each partial
result, you can run:

for j in range(a.shape[1]):
for k in range(a.shape[2]):
b = a[:, j, k]
res = longestZeroSeqLength(b)
print(f'{j}, {k}: {b}, {res}')

How to find consecutive positive, negative and zeroes in a numpy array?

Here is a numpy approach:

# create example
arr = np.random.randint(-2,3,(10))

# split into negative, zero, positive
*nzp, = map(np.flatnonzero,(arr<0,arr==0,arr>0))
# find block boundaries
*bb, = (np.flatnonzero(np.diff(x,prepend=-2,append=-2)-1) for x in nzp)
# compute block sizes
*bs, = map(np.diff,bb)

# show all
for data in (arr,nzp,bb,bs): print(data)
# [-1 1 -1 1 0 0 2 -1 -2 1]
# [array([0, 2, 7, 8]), array([4, 5]), array([1, 3, 6, 9])]
# [array([0, 1, 2, 4]), array([0, 2]), array([0, 1, 2, 3, 4])]
# [array([1, 1, 2]), array([2]), array([1, 1, 1, 1])]

How to find indices for consecutive zeros at the end of the list?

With that pattern of trailing zeros confirmed, we can do use an array and NumPy tools based method, like this -

len(test)-np.equal(test,0)[::-1].argmin()

If you need all zeros indices until that index -

In [100]: mask = np.equal(test,0)

In [101]: idx = len(test)-mask[::-1].argmin()

In [102]: np.flatnonzero(mask[:idx])
Out[102]: array([5, 6, 7])

To explain the index getting part, let's break it down into steps -

# Mask of zeros
In [100]: mask = np.equal(test,0)

In [101]: mask
Out[103]:
array([False, False, False, False, False, True, True, True, False,
True, True, True])

# Flip it
In [104]: mask[::-1]
Out[104]:
array([ True, True, True, False, True, True, True, False, False,
False, False, False])

# Get the first index of False ones, which would be the last non-zero
# value from original array. Note that this is on flipped version of input
In [105]: mask[::-1].argmin()
Out[105]: 3

# Get the original index position by subtracting from the length of it
In [106]: len(test)-mask[::-1].argmin()
Out[106]: 9

Find consecutive ones in numpy array

Here's a vectorized approach based on differentiation -

import numpy as np
import pandas as pd

# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))

# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)

# Get intervals using differences between start and stop indices
start_stop = np.column_stack((starts[:,0], stops[:,1] - starts[:,1]))

# Get indices corresponding to max. interval lens and thus lens themselves
SS_df = pd.DataFrame(start_stop)
out = start_stop[SS_df.groupby([0],sort=False)[1].idxmax(),1]

Sample input, output -

Original sample case :

In [574]: counts
Out[574]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])

In [575]: out
Out[575]: array([2, 3, 2], dtype=int64)

Modified case :

In [577]: counts
Out[577]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 1, 1, 1, 1],
[0, 0, 0, 4, 1, 1, 1, 1, 1, 0, 1, 0]])

In [578]: out
Out[578]: array([2, 4, 5], dtype=int64)

Here's a Pure NumPy version that is identical to the previous until we have start, stop. Here's the full implementation -

# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))

# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)

# Get intervals using differences between start and stop indices
intvs = stops[:,1] - starts[:,1]

# Store intervals as a 2D array for further vectorized ops to make.
c = np.bincount(starts[:,0])
mask = np.arange(c.max()) < c[:,None]
intvs2D = mask.astype(float)
intvs2D[mask] = intvs

# Get max along each row as final output
out = intvs2D.max(1)

Is there a better way to remove parts with consecutive zeros that have length equal or above threshold?

It's also possible to find differences of nonzero items, fix the ones that exceeed threshold and reconstruct a sequence in a correct way.

def numpy_fix(a):
# STEP 1. find indices of nonzero items: [0 1 3 8 9 13 19]
idx = np.flatnonzero(a)
# STEP 2. Find differences along these indices (also insert a leading zero): [0 1 2 5 1 4 6]
df = np.diff(idx, prepend=0)
# STEP 3. Fix differences of indices larger than THRESHOLD: [0 1 2 1 1 4 1]
df[df>THRESHOLD] = 1
# STEP 4. Given differences on indices, reconstruct indices themselves: [0 1 3 4 5 9 10]
cs = np.cumsum(df)
z = np.zeros(cs[-1]+1, dtype=int) # create a list of zeros
z[cs] = 1 #pad it with ones within indices found
return z
>>> numpy_fix(a)
array([1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1])

(Note that it's correct only if a has no leading or trailing zeros)

%timeit numpy_fix(np.tile(a, (1, 50000)))
39.3 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Counting consecutive 1's in NumPy array

Here's a vectorized approach -

def island_cumsum_vectorized(a):
a_ext = np.concatenate(( [0], a, [0] ))
idx = np.flatnonzero(a_ext[1:] != a_ext[:-1])
a_ext[1:][idx[1::2]] = idx[::2] - idx[1::2]
return a_ext.cumsum()[1:-1]

Sample run -

In [91]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [92]: island_cumsum_vectorized(a)
Out[92]: array([1, 2, 3, 0, 0, 0, 1, 2, 0, 0])

In [93]: a = np.array([0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1])

In [94]: island_cumsum_vectorized(a)
Out[94]: array([0, 1, 2, 3, 4, 0, 0, 0, 1, 2, 0, 0, 1])

Runtime test

For the timings , I would use OP's sample input array and repeat/tile it and hopefully this should be a less opportunistic benchmark -

Small case :

In [16]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [17]: a = np.tile(a,10) # Repeat OP's data 10 times

# @Paul Panzer's solution
In [18]: %timeit np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])
10000 loops, best of 3: 73.4 µs per loop

In [19]: %timeit island_cumsum_vectorized(a)
100000 loops, best of 3: 8.65 µs per loop

Bigger case :

In [20]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [21]: a = np.tile(a,1000) # Repeat OP's data 1000 times

# @Paul Panzer's solution
In [22]: %timeit np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])
100 loops, best of 3: 6.52 ms per loop

In [23]: %timeit island_cumsum_vectorized(a)
10000 loops, best of 3: 49.7 µs per loop

Nah, I want really huge case :

In [24]: a = np.array([1, 1, 1, 0, 0, 0, 1, 1, 0, 0])

In [25]: a = np.tile(a,100000) # Repeat OP's data 100000 times

# @Paul Panzer's solution
In [26]: %timeit np.concatenate([np.cumsum(c) if c[0] == 1 else c for c in np.split(a, 1 + np.where(np.diff(a))[0])])
1 loops, best of 3: 725 ms per loop

In [27]: %timeit island_cumsum_vectorized(a)
100 loops, best of 3: 7.28 ms per loop

Vary the elements in a numpy array

This should do the trick, it roughly works by 1) finding all the consecutive zeros and counting them, 2) computing the size of the output array and initializing it with zeros, 3) placing the counts from part 1 in the correct places.

def cz(a):
a = np.asarray(a, int)

# Find where sequences of zeros start and end
wz = np.zeros(len(a) + 2, dtype=bool)
wz[1:-1] = a == 0
change = wz[1:] != wz[:-1]
edges = np.where(change)[0]
# Take the difference to get the number of zeros in each sequence
consecutive_zeros = edges[1::2] - edges[::2]

# Figure out where to put consecutive_zeros
idx = a.cumsum()
n = idx[-1] if len(idx) > 0 else 0
idx = idx[edges[::2]]
idx += np.arange(len(idx))

# Create output array and populate with values for consecutive_zeros
out = np.zeros(len(consecutive_zeros) + n)
out[idx] = consecutive_zeros
return out


Related Topics



Leave a reply



Submit