Create counter within consecutive runs of certain values
Here's a way, building on Joshua's rle
approach: (EDITED to use seq_len
and lapply
as per Marek's suggestion)
> (!x) * unlist(lapply(rle(x)$lengths, seq_len))
[1] 0 1 0 1 2 3 0 0 1 2
UPDATE. Just for kicks, here's another way to do it, around 5 times faster:
cumul_zeros <- function(x) {
x <- !x
rl <- rle(x)
len <- rl$lengths
v <- rl$values
cumLen <- cumsum(len)
z <- x
# replace the 0 at the end of each zero-block in z by the
# negative of the length of the preceding 1-block....
iDrops <- c(0, diff(v)) < 0
z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
# ... to ensure that the cumsum below does the right thing.
# We zap the cumsum with x so only the cumsums for the 1-blocks survive:
x*cumsum(z)
}
Try an example:
> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))
[1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0
Now compare times on a million-length vector:
> x <- sample(0:1, 1000000,T)
> system.time( z <- cumul_zeros(x))
user system elapsed
0.15 0.00 0.14
> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))
user system elapsed
0.75 0.00 0.75
Moral of the story: one-liners are nicer and easier to understand, but not always the fastest!
Create counter within consecutive runs of values
You need to use sequence
and rle
:
> sequence(rle(as.character(dataset$input))$lengths)
[1] 1 1 2 1 2 1 1 2 3 4 1 1
Create counter of consecutive runs of a certain value
SOG <- c(4,4,0,0,0,3,4,5,0,0,1,2,0,0,0)
#run length encoding:
tmp <- rle(SOG)
#turn values into logicals
tmp$values <- tmp$values == 0
#cumulative sum of TRUE values
tmp$values[tmp$values] <- cumsum(tmp$values[tmp$values])
#inverse the run length encoding
inverse.rle(tmp)
#[1] 0 0 1 1 1 0 0 0 2 2 0 0 3 3 3
Count and Assign Consecutive Occurrences of Variable
You can repeat the lengths
argument lengths
time in rle
with(rle(dataset$input), rep(lengths, lengths))
#[1] 1 2 2 2 2 1 4 4 4 4 1 1
Using dplyr
, we can use lag
to create groups and then count the number of rows in each group.
library(dplyr)
dataset %>%
group_by(gr = cumsum(input != lag(input, default = first(input)))) %>%
mutate(count = n())
and with data.table
library(data.table)
setDT(dataset)[, count:= .N, rleid(input)]
data
Make sure the input
column is character and not factor
.
dataset <- data.frame(input = c("a","b","b","a","a","c","a","a","a","a","b","c"),
stringsAsFactors = FALSE)
Count consecutive occurrences of a specific value in every row of a data frame in R
You've identified the two cases that the longest run can take: (1) somewhere int he middle or (2) split between the end and beginning of each row. Hence you want to calculate each condition and take the max like so:
df<-cbind(
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))
#> Winter Spring Summer Autumn
#> [1,] 0 0 0 3
#> [2,] 0 2 2 0
#> [3,] 3 4 7 4
# calculate the number of consecutive zeros at the start and end
startZeros <- apply(df,1,function(x)which.min(x==0)-1)
#> [1] 3 1 0
endZeros <- apply(df,1,function(x)which.min(rev(x==0))-1)
#> [1] 0 1 0
# calculate the longest run of zeros
longestRun <- apply(df,1,function(x){
y = rle(x);
max(y$lengths[y$values==0],0)}))
#> [1] 3 1 0
# take the max of the two values
pmax(longestRun,startZeros +endZeros )
#> [1] 3 2 0
Of course an even easier solution is:
longestRun <- apply(cbind(df,df),# tricky way to wrap the zeros from the start to the end
1,# the margin over which to apply the summary function
function(x){# the summary function
y = rle(x);
max(y$lengths[y$values==0],
0)#include zero incase there are no zeros in y$values
})
Note that the above solution works because my df
does not include the location
field (column).
R: count consecutive occurrences of values in a single column and by group
Use rleid
(from the data.table package) to get a grouping variable and then use ave
to apply seq_along
within common values of that grouping:
library(data.table)
transform(dataset, Counter = ave(YesNO, rleid(ID, YesNO), FUN = seq_along))
giving:
ID YesNO Counter
1 a 1 1
2 a 1 2
3 a 0 1
4 a 0 2
5 a 0 3
6 a 1 1
7 a 1 2
8 b 1 1
9 b 1 2
10 b 1 3
11 b 0 1
12 b 0 2
13 b 0 3
14 b 0 4
Count consecutive occurences of an element in string
To put the pieces together: here's a combination of my comment on your previous question and (parts of) my answer here: Count consecutive TRUE values within each block separately. The convenience functions rleid
and rowid
from the data.table
package are used.
Toy data with two strings of different length:
s <- c("a > a > b > b > b > a > b > b", "c > c > b > b > b > c > c")
library(data.table)
lapply(strsplit(s, " > "), function(x) paste0(x, rowid(rleid(x)), collapse = " > "))
# [[1]]
# [1] "a1 > a2 > b1 > b2 > b3 > a1 > b1 > b2"
#
# [[2]]
# [1] "c1 > c2 > b1 > b2 > b3 > c1 > c2"
Calculate maximum length of consecutive values in row over a set number
# condition is that x should be larger or equal to 3
condition <- function(x) x >= 3
# example row
row = c(2,4,3,3,4,5,1,0,5,1)
# we can use condition on row:
condition(row)
# and we can emplay rle on that:
rle(condition(row))
# we need to filter those rle results for TRUE:
r <- rle(condition(row))
r$length[r$values == TRUE]
# The answer is the max of the latter
max(r$length[r$values])
or for your dataframe example
# condition is that x should be larger or equal to 3
condition <- \(x) x >= 3
number <- function(row, condition){
r <- row |>
condition() |>
rle()
max(r$length[r$values])
}
df <- replicate(10, sample(0:5, 10, rep=T))
apply(df, 1, number, condition)
Count the rows in a data table where a condition has been met consecutively
Counting consecutive occurrences (i.e. run length) of b
for each ID
through specified update_date
DT[order(ID, update_date), occurence := 1:.N, by = list(ID, rleid(b))]
DT
#> update_date ID b occurence
#> 1: 2022-01-01 aapl U1 1
#> 2: 2022-01-02 aapl U1 2
#> 3: 2022-01-03 aapl U1 3
#> 4: 2022-01-04 aapl U2 1
#> 5: 2022-01-05 aapl U2 2
#> 6: 2022-01-06 aapl U2 3
#> 7: 2022-01-01 ibm D1 1
#> 8: 2022-01-02 ibm D2 1
#> 9: 2022-01-03 ibm D1 1
#> 10: 2022-01-04 ibm D3 1
#> 11: 2022-01-05 ibm D2 1
#> 12: 2022-01-06 ibm D3 1
Counting occurrences of b
for each ID
through specified update_date
This includes occurrences that are non-consecutive.
# Count of occurrences through present row
DT[order(ID, b, update_date), occurence := 1:.N, by = list(ID, b)]
DT
#> update_date ID b occurence
#> 1: 2022-01-01 aapl U1 1
#> 2: 2022-01-02 aapl U1 2
#> 3: 2022-01-03 aapl U1 3
#> 4: 2022-01-04 aapl U2 1
#> 5: 2022-01-05 aapl U2 2
#> 6: 2022-01-06 aapl U2 3
#> 7: 2022-01-01 ibm D1 1
#> 8: 2022-01-02 ibm D2 1
#> 9: 2022-01-03 ibm D1 2
#> 10: 2022-01-04 ibm D3 1
#> 11: 2022-01-05 ibm D2 2
#> 12: 2022-01-06 ibm D3 2
Related Topics
How to Change the Figure Caption Format in Bookdown
How to Make Variable Bar Widths in Ggplot2 Not Overlap or Gap
Plot.New Has Not Been Called Yet
Get Rid of \Addlinespace in Kable
Extreme Numerical Values in Floating-Point Precision in R
Too Few Periods for Decompose()
Ordering of Points in R Lines Plot
Returning Above and Below Rows of Specific Rows in R Dataframe
R's Read.CSV Prepending 1St Column Name with Junk Text
Any Suggestions for How to Plot Mixem Type Data Using Ggplot2
Rscript: There Is No Package Called ...
How to Make Time Difference in Same Units When Subtracting Posixct
Different Breaks Per Facet in Ggplot2 Histogram
Shift Values in Single Column of Dataframe Up
Rstudio Shiny Error: There Is No Package Called "Shinydashboard"
How Can a Data Ellipse Be Superimposed on a Ggplot2 Scatterplot