How to find the groups of consecutive elements in a NumPy array
Here's a lil func that might help:
def group_consecutives(vals, step=1):
"""Return list of consecutive lists of numbers from vals (number list)."""
run = []
result = [run]
expect = None
for v in vals:
if (v == expect) or (expect is None):
run.append(v)
else:
run = [v]
result.append(run)
expect = v + step
return result
>>> group_consecutives(a)
[[0], [47, 48, 49, 50], [97, 98, 99]]
>>> group_consecutives(a, step=47)
[[0, 47], [48], [49], [50, 97], [98], [99]]
P.S. This is pure Python. For a NumPy solution, see unutbu's answer.
Get groups of consecutive elements of a NumPy array based on condition
While measuring performance of my other answer I noticed that while it was faster than Austin's solution (for arrays of length <15000), its complexity was not linear.
Based on this answer I came up with the following solution using np.split
which is more efficent than both previously added answers here:
array = np.append(a, -np.inf) # padding so we don't lose last element
mask = array >= 6 # values to be removed
split_indices = np.where(mask)[0]
for subarray in np.split(array, split_indices + 1):
if len(subarray) > 2:
print(subarray[:-1])
gives:
[1. 4. 2.]
[4. 4.]
[3. 4. 4. 5.]
Performance*:
*Measured by perfplot
Get groups of consecutive elements of a NumPy array based on multiple conditions
Note that in your previous question when you looked for the elements in array
that are less than the threshold
, your mask
was defined not as mask = array < threshold
but as an inverse of it: mask = array >= threshold
. This is because it was used later to get elements that would be removed.
So, in your new example, you also have to get the inverse of your mask. Instead of mask = (a1 < c) & (a2 < d)
you need mask = ~((a1 < c) & (a2 < d))
:
a1= np.append(a, -np.inf)
a2 = np.append(b, -np.inf)
mask = ~((a1 < c) & (a2 < d))
split_indices = np.where(mask)[0]
for subarray in np.split(a, split_indices + 1):
if len(subarray) > 2:
print(subarray[:-1])
gives:
[3 4 4]
which is 15-17th elements of a
.
Find large number of consecutive values fulfilling condition in a numpy array
Here's a numpy-based solution.
I think (?) it should be faster than the other options. Hopefully it's fairly clear.
However, it does require a twice as much memory as the various generator-based solutions. As long as you can hold a single temporary copy of your data in memory (for the diff), and a boolean array of the same length as your data (1-bit-per-element), it should be pretty efficient...
import numpy as np
def main():
# Generate some random data
x = np.cumsum(np.random.random(1000) - 0.5)
condition = np.abs(x) < 1
# Print the start and stop indices of each region where the absolute
# values of x are below 1, and the min and max of each of these regions
for start, stop in contiguous_regions(condition):
segment = x[start:stop]
print start, stop
print segment.min(), segment.max()
def contiguous_regions(condition):
"""Finds contiguous True regions of the boolean array "condition". Returns
a 2D array where the first column is the start index of the region and the
second column is the end index."""
# Find the indicies of changes in "condition"
d = np.diff(condition)
idx, = d.nonzero()
# We need to start things after the change in "condition". Therefore,
# we'll shift the index by 1 to the right.
idx += 1
if condition[0]:
# If the start of condition is True prepend a 0
idx = np.r_[0, idx]
if condition[-1]:
# If the end of condition is True, append the length of the array
idx = np.r_[idx, condition.size] # Edit
# Reshape the result into two columns
idx.shape = (-1,2)
return idx
main()
Numpy: efficient way to extract range of consecutive numbers
You can compare consecutive indicators and then use where
or flatnonzero
>>> x
array([[0, 0, 0, 0, 1, 1, 1],
[1, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 1, 1, 1, 0]])
>>>
# find switches 0->1 and 1->0
>>> d = np.empty((np.arange(2) + x.shape), bool)
>>> d[:, 0] = x[:, 0] # a 1 in the first
>>> d[:, -1] = x[:, -1] # or last column counts as a switch
>>> d[:, 1:-1] = x[:, 1:] != x[:, :-1]
>>>
# find switch indices (of flattened array)
>>> b = np.flatnonzero(d)
# create helper array of row offsets
>>> o = np.arange(0, d.size, d.shape[1])
# split into rows, subtract row offsets and reshape into start, end pairs
>>> result = [(x-y).reshape(-1, 2) for x, y in zip(np.split(b, b.searchsorted(o[1:])), o)]
>>>
>>> result
[array([[4, 7]]), array([[0, 3],
[5, 7]]), array([[3, 6]])]
This uses python convention, i.e. right end excluded. If you want right end included use result = [(x-y).reshape(-1, 2) - np.arange(2) for x, y in zip(np.split(b, b.searchsorted(o[1:])), o)]
instead.
Count consecutive occurences of values varying in length in a numpy array
Here's a solution using itertools
(it's probably not the fastest solution):
import itertools
condition = [True,True,True,False,False,True,True,False,True]
[ sum( 1 for _ in group ) for key, group in itertools.groupby( condition ) if key ]
Out:
[3, 2, 1]
Finding the consecutive zeros in a numpy array
Here's a fairly compact vectorized implementation. I've changed the requirements a bit, so the return value is a bit more "numpythonic": it creates an array with shape (m, 2), where m is the number of "runs" of zeros. The first column is the index of the first 0 in each run, and the second is the index of the first nonzero element after the run. (This indexing pattern matches, for example, how slicing works and how the range
function works.)
import numpy as np
def zero_runs(a):
# Create an array that is 1 where a is 0, and pad each end with an extra 0.
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
For example:
In [236]: a = [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7, 0, 10, 11]
In [237]: runs = zero_runs(a)
In [238]: runs
Out[238]:
array([[ 3, 9],
[12, 16],
[19, 20]])
With this format, it is simple to get the number of zeros in each run:
In [239]: runs[:,1] - runs[:,0]
Out[239]: array([6, 4, 1])
It's always a good idea to check the edge cases:
In [240]: zero_runs([0,1,2])
Out[240]: array([[0, 1]])
In [241]: zero_runs([1,2,0])
Out[241]: array([[2, 3]])
In [242]: zero_runs([1,2,3])
Out[242]: array([], shape=(0, 2), dtype=int64)
In [243]: zero_runs([0,0,0])
Out[243]: array([[0, 3]])
Replace consecutive identic elements in the beginning of an array with 0
You can use argmax
on a boolean array to get the index of the first changing value.
Then slice and replace:
n = (x!=x[0]).argmax() # 4
x[:n] = 0
output:
array([0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])
intermediate array:
(x!=x[0])
# n=4
# [False False False False True True True True True True True True
# True True True True True True True]
How to group consecutive data in 2d array in python
Here you go ...
Steps are:
- Get a list
xUnique
of unique column 1 values with sort order preserved. - Build a list
xRanges
of items of the form[col1_value, [col2_min, col2_max]]
holding the column 2 ranges for each column 1 value. - Build a list
xGroups
of items of the form[[col1_min, col1_max], [col2_min, col2_max]]
where the[col1_min, col1_max]
part is created by merging thecol1_value
part of consecutive items inxRanges
if they differ by 1 and have identical[col2_min, col2_max]
value ranges for column 2. - Turn the ranges in each item of
xGroups
into strings and print with the required row and column headings. - Also package and print as a
numpy.array
to match the form of the input.
import numpy as np
data = np.array([
[1, 1],
[1, 2],
[2, 1],
[2, 2],
[3, 1],
[5, 1],
[5, 2]])
xUnique = list({pair[0] for pair in data})
xRanges = list(zip(xUnique, [[0, 0] for _ in range(len(xUnique))]))
rows, cols = data.shape
iRange = -1
for i in range(rows):
if i == 0 or data[i, 0] > data[i - 1, 0]:
iRange += 1
xRanges[iRange][1][0] = data[i, 1]
xRanges[iRange][1][1] = data[i, 1]
xGroups = []
for i in range(len(xRanges)):
if i and xRanges[i][0] - xRanges[i - 1][0] == 1 and xRanges[i][1] == xRanges[i - 1][1]:
xGroups[-1][0][1] = xRanges[i][0]
else:
xGroups += [[[xRanges[i][0], xRanges[i][0]], xRanges[i][1]]]
xGroupStrs = [ [f'{a}-{b}' for a, b in row] for row in xGroups]
groupArray = np.array(xGroupStrs)
print(groupArray)
print()
print(f'{"":<10}{"Col1":<8}{"Col2":<8}')
[print(f'{"group " + str(i) + ":":<10}{col1:<8}{col2:<8}') for i, (col1, col2) in enumerate(xGroupStrs)]
Output:
[['1-2' '1-2']
['3-3' '1-1']
['5-5' '1-2']]
Col1 Col2
group 0: 1-2 1-2
group 1: 3-3 1-1
group 2: 5-5 1-2
Related Topics
Pyinstaller and --Onefile: How to Include an Image in the Exe File
Getting a Callback When a Tkinter Listbox Selection Is Changed
If X:, VS If X == True, VS If X Is True
How to Read and Write Ini File with Python3
How to Format a String Using a Dictionary in Python-3.X
Parsing Datetime Strings Containing Nanoseconds
Do Python for Loops Work by Reference
Cannot Install Lxml on MAC Os X 10.9
Running Python on Windows for Node.Js Dependencies
Usage of Sys.Stdout.Flush() Method
Numpy Sum Elements in Array Based on Its Value
How to Use Qscrollarea to Make Scrollbars Appear
How Include Static Files to Setuptools - Python Package
Efficiently Return the Index of the First Value Satisfying Condition in Array