Find Length of Sequences of Identical Values in a Numpy Array (Run Length Encoding)

find length of sequences of identical values in a numpy array (run length encoding)

While not numpy primitives, itertools functions are often very fast, so do give this one a try (and measure times for various solutions including this one, of course):

def runs_of_ones(bits):
  for bit, group in itertools.groupby(bits):
    if bit: yield sum(group)

If you do need the values in a list, just can use list(runs_of_ones(bits)), of course; but maybe a list comprehension might be marginally faster still:

def runs_of_ones_list(bits):
  return [sum(g) for b, g in itertools.groupby(bits) if b]

Moving to "numpy-native" possibilities, what about:

def runs_of_ones_array(bits):
  # make sure all runs of ones are well-bounded
  bounded = numpy.hstack(([0], bits, [0]))
  # get 1 at run starts and -1 at run ends
  difs = numpy.diff(bounded)
  run_starts, = numpy.where(difs > 0)
  run_ends, = numpy.where(difs < 0)
  return run_ends - run_starts

Again: be sure to benchmark solutions against each others in realistic-for-you examples!

How to find Run length encoding in python

You can do this with groupby

from itertools import groupby
ar = [2,2,2,1,1,2,2,3,3,3,3]
print([(k, sum(1 for i in g)) for k,g in groupby(ar)])
# [(2, 3), (1, 2), (2, 2), (3, 4)]

getting ranges of sequences of identical entries with minimum length in a numpy array

One approach using np.diff and np.where -

# Append with `-1s` at either ends and get the differentiation
dfa = np.diff(np.hstack((-1,a,-1)))

# Get the positions of starts and stops of 1s in `a`
starts = np.where(dfa==2)[0]
stops = np.where(dfa==-2)[0]

# Get valid mask for pairs from starts and stops being of at least 3 in length
valid_mask = (stops - starts) >= 3

# Finally collect the valid pairs as the output
out = np.column_stack((starts,stops))[valid_mask].tolist()

Find Consecutive Repeats of Specific Length in NumPy

Approach #1

We could leverage 1D convolution for a vectorized solution -

def consec_repeat_starts(a, n):
    N = n-1
    m = a[:-1]==a[1:]
    return np.flatnonzero(np.convolve(m,np.ones(N, dtype=int))==N)-N+1

Sample runs -

In [286]: a
Out[286]: 
array([ 0,  1,  2,  2,  3,  4,  5,  5,  6,  7,  8,  9,  9,  9, 10, 11, 12,
       13, 13, 13, 14, 15])

In [287]: consec_repeat_starts(a, 2)
Out[287]: array([ 2,  6, 11, 12, 17, 18])

In [288]: consec_repeat_starts(a, 3)
Out[288]: array([11, 17])

In [289]: consec_repeat_starts(a, 4)
Out[289]: array([], dtype=int64)

Approach #2

We could also make use of binary-erosion -

from scipy.ndimage.morphology import binary_erosion

def consec_repeat_starts_v2(a, n):
    N = n-1
    m = a[:-1]==a[1:]
    return np.flatnonzero(binary_erosion(m,[1]*N))-(N//2)

How do I find the length of a run of numbers in a list? (Is there a faster way than what I'm doing?)

I might use itertools.groupby for this one

lst = [ 1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0]

from itertools import groupby
from operator import itemgetter

for k,v in groupby(enumerate(lst),key=itemgetter(1)):
    if k:
        v = list(v)
        print v[0][0],v[-1][0]

This will print the start and end indices of the groups of 1's

Count number of repeated elements in a row in a numpy array

You can use itertools.groupby to perform the operation without invoking numpy.

import itertools

X = [1,1,1,2,2,2,2,2,3,3,1,1,0,0,0,5]

Y = [(x, len(list(y))) for x, y in itertools.groupby(X)]

print(Y)
# [(1, 3), (2, 5), (3, 2), (1, 2), (0, 3), (5, 1)]

Match lengths of multiple Numpy arrays of unequal length

First we can do return the min length

mlen = min(map(len, [a, b, c]))
8

Then

newl=[x[: mlen ] for x in [a,b,c]]

Find Length of Sequences of Identical Values in a Numpy Array (Run Length Encoding)