Get Start and End Index of Runs of Values

Get start and end index of runs of values

A solution from base R.

a <- c(1,1,0,0,1,2,0,0)

# Get run length encoding
b <- rle(a)

# Create a data frame
dt <- data.frame(number = b$values, lengths = b$lengths)
# Get the end
dt$end <- cumsum(dt$lengths)
# Get the start
dt$start <- dt$end - dt$lengths + 1

# Select columns
dt <- dt[, c("number", "start", "end")]
# Sort rows
dt <- dt[order(dt$number), ]

dt
# number start end
#2 0 3 4
#5 0 7 8
#1 1 1 2
#3 1 5 5
#4 2 6 6

Update

Here is a solution using with to make the code more concise.

with(rle(a), data.frame(number = values,
start = cumsum(lengths) - lengths + 1,
end = cumsum(lengths))[order(values),])
# number start end
#2 0 3 4
#5 0 7 8
#1 1 1 2
#3 1 5 5
#4 2 6 6

Find start and end positions/indices of runs/consecutive values

Core logic:

# Example vector and rle object
x = rev(rep(6:10, 1:5))
rle_x = rle(x)

# Compute endpoints of run
end = cumsum(rle_x$lengths)
start = c(1, lag(end)[-1] + 1)

# Display results
data.frame(start, end)
# start end
# 1 1 5
# 2 6 9
# 3 10 12
# 4 13 14
# 5 15 15

Tidyverse/dplyr way (data frame-centric):

library(dplyr)

rle(x) %>%
unclass() %>%
as.data.frame() %>%
mutate(end = cumsum(lengths),
start = c(1, dplyr::lag(end)[-1] + 1)) %>%
magrittr::extract(c(1,2,4,3)) # To re-order start before end for display

Because the start and end vectors are the same length as the values component of the rle object, solving the related problem of identifying endpoints for runs meeting some condition is straightforward: filter or subset the start and end vectors using the condition on the run values.

find start end index of bouts of consecutive equal values

Use the shifting cumsum trick to mark consecutive groups, then use groupby to get indices and filter by your conditions.

v = (df['A'] != df['A'].shift()).cumsum()
u = df.groupby(v)['A'].agg(['all', 'count'])
m = u['all'] & u['count'].ge(3)

df.groupby(v).apply(lambda x: (x.index[0], x.index[-1]))[m]

A
3 (3, 5)
7 (9, 11)
dtype: object

Python dataframe get index start and end of successive values

Given

>>> df
0
0 1
1 1
2 1
3 2
4 2
5 3
6 3
7 1
8 1

Solution:

starts_bool = df.diff().ne(0)[0]
starts = df.index[starts_bool]
ends = df.index[starts_bool.shift(-1, fill_value=True)]

result = (df.loc[starts]
.reset_index(drop=True)
.assign(Start=starts, End=ends)
.rename({0: 'Value'}, axis='columns')
)

Result:

>>> result
value Start End
0 1 0 2
1 2 3 4
2 3 5 6
3 1 7 8

Finding starting and ending index of consecutive numbers in python

This code should help count groups of repeating integers in a list-

#! /usr/bin/python3

A=[1,2,2,2,2,2,2,2,2,2,3,5,5,5,5,5,5,5,6,7]
B = [1,2,2,2,2,2,2,2,2,2,3,5,5,5,5,5,5,5,6,7,2,2,2]

def repeatingNumbers(numList):
i = 0

while i < len(numList) - 1:
n = numList[i]
startIndex = i
while i < len(numList) - 1 and numList[i] == numList[i + 1]:
i = i + 1

endIndex = i

print("{0} >> {1}".format(n, [startIndex, endIndex]))
i = i + 1

repeatingNumbers(B)

Fast way to find length and start index of repeated elements in array

Here is a pedestrian try, solving the problem by programming the problem.

We prepend and also append a zero to A, getting a vector ZA, then detect the 1 islands, and the 0 islands coming in alternating manner in the ZA by comparing the shifted versions ZA[1:] and ZA[-1]. (In the constructed arrays we take the even places, corresponding to the ones in A.)

import numpy as np

def structure(A):
ZA = np.concatenate(([0], A, [0]))
indices = np.flatnonzero( ZA[1:] != ZA[:-1] )
counts = indices[1:] - indices[:-1]
return indices[::2], counts[::2]

Some sample runs:

In [71]: structure(np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0] ))
Out[71]: (array([ 2, 6, 10]), array([3, 2, 1]))

In [72]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1] ))
Out[72]: (array([ 0, 5, 9, 13, 15]), array([3, 3, 2, 1, 1]))

In [73]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0] ))
Out[73]: (array([0, 5, 9]), array([3, 3, 2]))

In [74]: structure(np.array( [1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1] ))
Out[74]: (array([ 0, 2, 5, 7, 11, 14]), array([1, 2, 1, 3, 2, 3]))

Python: Find starting, ending index of sub-text column from another text column

Try:

df['answer_start'] = df.apply(lambda x: x['context'].find(x['answer']), axis=1)
df['answer_end'] = df['answer_start'] + df['answer'].str.len()
>>> df[['answer_start', 'answer_end']]
answer_start answer_end
0 113 149

Find Start / Stop Index Range For Values in NumPy Array Greater Than N

Here's another solution (which I believe can be improved):

import numpy as np
from numpy.lib.stride_tricks import as_strided

x = np.array([2, 3, 4, 0, 0, 1, 1, 4, 6, 5, 8, 9, 9, 4, 2, 0, 3])

# array of unique values of x bigger than 1
a = np.unique(x[x>=2])

step = len(a) # if you encounter memory problems, try a smaller step
result = []
for i in range(0, len(a), step):
ai = a[i:i + step]
c = np.argwhere(x >= ai[:, None])
c[:,0] = ai[c[:,0]]
c = np.pad(c, ((1,1), (0,0)), 'symmetric')

d = np.where(np.diff(c[:,1]) !=1)[0]

e = as_strided(d, shape=(len(d)-1, 2), strides=d.strides*2).copy()
# e = e[(np.diff(e, axis=1) > 1).flatten()]
e[:,0] = e[:,0] + 1

result.append(np.hstack([c[:,0][e[:,0, None]], c[:,1][e]]))

result = np.concatenate(result)

# array([[ 2, 0, 2],
# [ 2, 7, 14],
# [ 2, 16, 16],
# [ 3, 1, 2],
# [ 3, 7, 13],
# [ 3, 16, 16],
# [ 4, 2, 2],
# [ 4, 7, 13],
# [ 5, 8, 12],
# [ 6, 8, 8],
# [ 6, 10, 12],
# [ 8, 10, 12],
# [ 9, 11, 12]])

Sorry for not commenting what each step does -- if later I will find time I will fix it.



Related Topics



Leave a reply



Submit