Get start and end index of runs of values
A solution from base R.
a <- c(1,1,0,0,1,2,0,0)
# Get run length encoding
b <- rle(a)
# Create a data frame
dt <- data.frame(number = b$values, lengths = b$lengths)
# Get the end
dt$end <- cumsum(dt$lengths)
# Get the start
dt$start <- dt$end - dt$lengths + 1
# Select columns
dt <- dt[, c("number", "start", "end")]
# Sort rows
dt <- dt[order(dt$number), ]
dt
# number start end
#2 0 3 4
#5 0 7 8
#1 1 1 2
#3 1 5 5
#4 2 6 6
Update
Here is a solution using with
to make the code more concise.
with(rle(a), data.frame(number = values,
start = cumsum(lengths) - lengths + 1,
end = cumsum(lengths))[order(values),])
# number start end
#2 0 3 4
#5 0 7 8
#1 1 1 2
#3 1 5 5
#4 2 6 6
Find start and end positions/indices of runs/consecutive values
Core logic:
# Example vector and rle object
x = rev(rep(6:10, 1:5))
rle_x = rle(x)
# Compute endpoints of run
end = cumsum(rle_x$lengths)
start = c(1, lag(end)[-1] + 1)
# Display results
data.frame(start, end)
# start end
# 1 1 5
# 2 6 9
# 3 10 12
# 4 13 14
# 5 15 15
Tidyverse/dplyr
way (data frame-centric):
library(dplyr)
rle(x) %>%
unclass() %>%
as.data.frame() %>%
mutate(end = cumsum(lengths),
start = c(1, dplyr::lag(end)[-1] + 1)) %>%
magrittr::extract(c(1,2,4,3)) # To re-order start before end for display
Because the start
and end
vectors are the same length as the values
component of the rle
object, solving the related problem of identifying endpoints for runs meeting some condition is straightforward: filter
or subset the start
and end
vectors using the condition on the run values.
find start end index of bouts of consecutive equal values
Use the shifting cumsum trick to mark consecutive groups, then use groupby
to get indices and filter by your conditions.
v = (df['A'] != df['A'].shift()).cumsum()
u = df.groupby(v)['A'].agg(['all', 'count'])
m = u['all'] & u['count'].ge(3)
df.groupby(v).apply(lambda x: (x.index[0], x.index[-1]))[m]
A
3 (3, 5)
7 (9, 11)
dtype: object
Python dataframe get index start and end of successive values
Given
>>> df
0
0 1
1 1
2 1
3 2
4 2
5 3
6 3
7 1
8 1
Solution:
starts_bool = df.diff().ne(0)[0]
starts = df.index[starts_bool]
ends = df.index[starts_bool.shift(-1, fill_value=True)]
result = (df.loc[starts]
.reset_index(drop=True)
.assign(Start=starts, End=ends)
.rename({0: 'Value'}, axis='columns')
)
Result:
>>> result
value Start End
0 1 0 2
1 2 3 4
2 3 5 6
3 1 7 8
Finding starting and ending index of consecutive numbers in python
This code should help count groups of repeating integers in a list-
#! /usr/bin/python3
A=[1,2,2,2,2,2,2,2,2,2,3,5,5,5,5,5,5,5,6,7]
B = [1,2,2,2,2,2,2,2,2,2,3,5,5,5,5,5,5,5,6,7,2,2,2]
def repeatingNumbers(numList):
i = 0
while i < len(numList) - 1:
n = numList[i]
startIndex = i
while i < len(numList) - 1 and numList[i] == numList[i + 1]:
i = i + 1
endIndex = i
print("{0} >> {1}".format(n, [startIndex, endIndex]))
i = i + 1
repeatingNumbers(B)
Fast way to find length and start index of repeated elements in array
Here is a pedestrian try, solving the problem by programming the problem.
We prepend and also append a zero to A
, getting a vector ZA
, then detect the 1
islands, and the 0
islands coming in alternating manner in the ZA
by comparing the shifted versions ZA[1:]
and ZA[-1]
. (In the constructed arrays we take the even places, corresponding to the ones in A
.)
import numpy as np
def structure(A):
ZA = np.concatenate(([0], A, [0]))
indices = np.flatnonzero( ZA[1:] != ZA[:-1] )
counts = indices[1:] - indices[:-1]
return indices[::2], counts[::2]
Some sample runs:
In [71]: structure(np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0] ))
Out[71]: (array([ 2, 6, 10]), array([3, 2, 1]))
In [72]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1] ))
Out[72]: (array([ 0, 5, 9, 13, 15]), array([3, 3, 2, 1, 1]))
In [73]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0] ))
Out[73]: (array([0, 5, 9]), array([3, 3, 2]))
In [74]: structure(np.array( [1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1] ))
Out[74]: (array([ 0, 2, 5, 7, 11, 14]), array([1, 2, 1, 3, 2, 3]))
Python: Find starting, ending index of sub-text column from another text column
Try:
df['answer_start'] = df.apply(lambda x: x['context'].find(x['answer']), axis=1)
df['answer_end'] = df['answer_start'] + df['answer'].str.len()
>>> df[['answer_start', 'answer_end']]
answer_start answer_end
0 113 149
Find Start / Stop Index Range For Values in NumPy Array Greater Than N
Here's another solution (which I believe can be improved):
import numpy as np
from numpy.lib.stride_tricks import as_strided
x = np.array([2, 3, 4, 0, 0, 1, 1, 4, 6, 5, 8, 9, 9, 4, 2, 0, 3])
# array of unique values of x bigger than 1
a = np.unique(x[x>=2])
step = len(a) # if you encounter memory problems, try a smaller step
result = []
for i in range(0, len(a), step):
ai = a[i:i + step]
c = np.argwhere(x >= ai[:, None])
c[:,0] = ai[c[:,0]]
c = np.pad(c, ((1,1), (0,0)), 'symmetric')
d = np.where(np.diff(c[:,1]) !=1)[0]
e = as_strided(d, shape=(len(d)-1, 2), strides=d.strides*2).copy()
# e = e[(np.diff(e, axis=1) > 1).flatten()]
e[:,0] = e[:,0] + 1
result.append(np.hstack([c[:,0][e[:,0, None]], c[:,1][e]]))
result = np.concatenate(result)
# array([[ 2, 0, 2],
# [ 2, 7, 14],
# [ 2, 16, 16],
# [ 3, 1, 2],
# [ 3, 7, 13],
# [ 3, 16, 16],
# [ 4, 2, 2],
# [ 4, 7, 13],
# [ 5, 8, 12],
# [ 6, 8, 8],
# [ 6, 10, 12],
# [ 8, 10, 12],
# [ 9, 11, 12]])
Sorry for not commenting what each step does -- if later I will find time I will fix it.
Related Topics
Margins Between Plots in Grid.Arrange
Debugging Package::Function() Although Lazy Evaluation Is Used
How to Use Stat_Function by Group
Simple for Loop in R Producing "Replacement Has Length Zero" in R
Cumulative Sums Over Run Lengths. Can This Loop Be Vectorized
"Nas Introduced by Coercion" During Cluster Analysis in R
How to Get Column Names When Using Skip Along with Read.Csv
How to Remove Rows with Nas Only If They Are Present in More Than Certain Percentage of Columns
Error with Pred$Fit Using Nls in Ggplot2
Disable Gui, Graphics Devices in R
What's The Difference Between [1], [1,], [,1], [[1]] for a Dataframe in R
R Shiny: How to Change The Background Color of The Header
Rstudio Viewer Pane Not Working
Download Multiple CSV Files with One Button (Downloadhandler) with R Shiny
R: Xmleventparse with Large, Varying-Node Xml Input and Conversion to Data Frame
Ggplot: Line Plot for Discrete X-Axis
Importing Many Files at The Same Time and Adding Id Indicator