How to Find the Groups of Consecutive Elements in a Numpy Array

How to find the groups of consecutive elements in a NumPy array

Here's a lil func that might help:

def group_consecutives(vals, step=1):
    """Return list of consecutive lists of numbers from vals (number list)."""
    run = []
    result = [run]
    expect = None
    for v in vals:
        if (v == expect) or (expect is None):
            run.append(v)
        else:
            run = [v]
            result.append(run)
        expect = v + step
    return result

>>> group_consecutives(a)
[[0], [47, 48, 49, 50], [97, 98, 99]]
>>> group_consecutives(a, step=47)
[[0, 47], [48], [49], [50, 97], [98], [99]]

P.S. This is pure Python. For a NumPy solution, see unutbu's answer.

Get groups of consecutive elements of a NumPy array based on condition

While measuring performance of my other answer I noticed that while it was faster than Austin's solution (for arrays of length <15000), its complexity was not linear.

Based on this answer I came up with the following solution using np.split which is more efficent than both previously added answers here:

array = np.append(a, -np.inf)  # padding so we don't lose last element
mask = array >= 6  # values to be removed
split_indices = np.where(mask)[0]
for subarray in np.split(array, split_indices + 1):
    if len(subarray) > 2:
        print(subarray[:-1])

gives:

[1. 4. 2.]
[4. 4.]
[3. 4. 4. 5.]

Performance*:

Sample Image

^{*Measured by perfplot}

Get groups of consecutive elements of a NumPy array based on multiple conditions

Note that in your previous question when you looked for the elements in array that are less than the threshold, your mask was defined not as mask = array < threshold but as an inverse of it: mask = array >= threshold. This is because it was used later to get elements that would be removed.

So, in your new example, you also have to get the inverse of your mask. Instead of mask = (a1 < c) & (a2 < d) you need mask = ~((a1 < c) & (a2 < d)):

a1= np.append(a, -np.inf)
a2 = np.append(b, -np.inf)
mask = ~((a1 < c) & (a2 < d))
split_indices = np.where(mask)[0]
for subarray in np.split(a, split_indices + 1):
    if len(subarray) > 2:
        print(subarray[:-1])

gives:

[3 4 4]

which is 15-17th elements of a.

Find large number of consecutive values fulfilling condition in a numpy array

Here's a numpy-based solution.

I think (?) it should be faster than the other options. Hopefully it's fairly clear.

However, it does require a twice as much memory as the various generator-based solutions. As long as you can hold a single temporary copy of your data in memory (for the diff), and a boolean array of the same length as your data (1-bit-per-element), it should be pretty efficient...

import numpy as np

def main():
    # Generate some random data
    x = np.cumsum(np.random.random(1000) - 0.5)
    condition = np.abs(x) < 1
    
    # Print the start and stop indices of each region where the absolute 
    # values of x are below 1, and the min and max of each of these regions
    for start, stop in contiguous_regions(condition):
        segment = x[start:stop]
        print start, stop
        print segment.min(), segment.max()

def contiguous_regions(condition):
    """Finds contiguous True regions of the boolean array "condition". Returns
    a 2D array where the first column is the start index of the region and the
    second column is the end index."""

    # Find the indicies of changes in "condition"
    d = np.diff(condition)
    idx, = d.nonzero() 

    # We need to start things after the change in "condition". Therefore, 
    # we'll shift the index by 1 to the right.
    idx += 1

    if condition[0]:
        # If the start of condition is True prepend a 0
        idx = np.r_[0, idx]

    if condition[-1]:
        # If the end of condition is True, append the length of the array
        idx = np.r_[idx, condition.size] # Edit

    # Reshape the result into two columns
    idx.shape = (-1,2)
    return idx

main()

Numpy: efficient way to extract range of consecutive numbers

You can compare consecutive indicators and then use where or flatnonzero

>>> x
array([[0, 0, 0, 0, 1, 1, 1],
       [1, 1, 1, 0, 0, 1, 1],
       [0, 0, 0, 1, 1, 1, 0]])
>>> 
# find switches 0->1 and 1->0
>>> d = np.empty((np.arange(2) + x.shape), bool)
>>> d[:, 0] = x[:, 0]   # a 1 in the first
>>> d[:, -1] = x[:, -1] # or last column counts as a switch
>>> d[:, 1:-1] = x[:, 1:] != x[:, :-1]
>>> 
# find switch indices (of flattened array)
>>> b = np.flatnonzero(d)
# create helper array of row offsets
>>> o = np.arange(0, d.size, d.shape[1])
# split into rows, subtract row offsets and reshape into start, end pairs
>>> result = [(x-y).reshape(-1, 2) for x, y in zip(np.split(b, b.searchsorted(o[1:])), o)]
>>> 
>>> result
[array([[4, 7]]), array([[0, 3],
       [5, 7]]), array([[3, 6]])]

This uses python convention, i.e. right end excluded. If you want right end included use result = [(x-y).reshape(-1, 2) - np.arange(2) for x, y in zip(np.split(b, b.searchsorted(o[1:])), o)] instead.

Count consecutive occurences of values varying in length in a numpy array

Here's a solution using itertools (it's probably not the fastest solution):

import itertools
condition = [True,True,True,False,False,True,True,False,True]
[ sum( 1 for _ in group ) for key, group in itertools.groupby( condition ) if key ]

Out:
[3, 2, 1]

Finding the consecutive zeros in a numpy array

Here's a fairly compact vectorized implementation. I've changed the requirements a bit, so the return value is a bit more "numpythonic": it creates an array with shape (m, 2), where m is the number of "runs" of zeros. The first column is the index of the first 0 in each run, and the second is the index of the first nonzero element after the run. (This indexing pattern matches, for example, how slicing works and how the range function works.)

import numpy as np

def zero_runs(a):
    # Create an array that is 1 where a is 0, and pad each end with an extra 0.
    iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
    absdiff = np.abs(np.diff(iszero))
    # Runs start and end where absdiff is 1.
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
    return ranges

For example:

In [236]: a = [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7, 0, 10, 11]

In [237]: runs = zero_runs(a)

In [238]: runs
Out[238]: 
array([[ 3,  9],
       [12, 16],
       [19, 20]])

With this format, it is simple to get the number of zeros in each run:

In [239]: runs[:,1] - runs[:,0]
Out[239]: array([6, 4, 1])

It's always a good idea to check the edge cases:

In [240]: zero_runs([0,1,2])
Out[240]: array([[0, 1]])

In [241]: zero_runs([1,2,0])
Out[241]: array([[2, 3]])

In [242]: zero_runs([1,2,3])
Out[242]: array([], shape=(0, 2), dtype=int64)

In [243]: zero_runs([0,0,0])
Out[243]: array([[0, 3]])

Replace consecutive identic elements in the beginning of an array with 0

You can use argmax on a boolean array to get the index of the first changing value.

Then slice and replace:

n = (x!=x[0]).argmax()  # 4
x[:n] = 0

output:

array([0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])

intermediate array:

(x!=x[0])

#                            n=4
# [False False False False  True  True  True  True  True  True  True  True
#  True  True  True  True  True  True  True]

How to group consecutive data in 2d array in python

Here you go ...

Steps are:

Get a list xUnique of unique column 1 values with sort order preserved.
Build a list xRanges of items of the form [col1_value, [col2_min, col2_max]] holding the column 2 ranges for each column 1 value.
Build a list xGroups of items of the form [[col1_min, col1_max], [col2_min, col2_max]] where the [col1_min, col1_max] part is created by merging the col1_value part of consecutive items in xRanges if they differ by 1 and have identical [col2_min, col2_max] value ranges for column 2.
Turn the ranges in each item of xGroups into strings and print with the required row and column headings.
Also package and print as a numpy.array to match the form of the input.

import numpy as np
data = np.array([
    [1, 1],
    [1, 2],    
    [2, 1],    
    [2, 2],
    [3, 1],
    [5, 1],
    [5, 2]])
xUnique = list({pair[0] for pair in data})
xRanges = list(zip(xUnique, [[0, 0] for _ in range(len(xUnique))]))
rows, cols = data.shape
iRange = -1
for i in range(rows):
    if i == 0 or data[i, 0] > data[i - 1, 0]:
        iRange += 1
        xRanges[iRange][1][0] = data[i, 1]
    xRanges[iRange][1][1] = data[i, 1]
xGroups = []
for i in range(len(xRanges)):
    if i and xRanges[i][0] - xRanges[i - 1][0] == 1 and xRanges[i][1] == xRanges[i - 1][1]:
        xGroups[-1][0][1] = xRanges[i][0]
    else:
        xGroups += [[[xRanges[i][0], xRanges[i][0]], xRanges[i][1]]]

xGroupStrs = [ [f'{a}-{b}' for a, b in row] for row in xGroups]

groupArray = np.array(xGroupStrs)
print(groupArray)

print()
print(f'{"":<10}{"Col1":<8}{"Col2":<8}')
[print(f'{"group " + str(i) + ":":<10}{col1:<8}{col2:<8}') for i, (col1, col2) in enumerate(xGroupStrs)]

Output:

[['1-2' '1-2']
 ['3-3' '1-1']
 ['5-5' '1-2']]

          Col1    Col2
group 0:  1-2     1-2
group 1:  3-3     1-1
group 2:  5-5     1-2

How to Find the Groups of Consecutive Elements in a Numpy Array