Create Counter of Consecutive Runs of a Certain Value

Create counter within consecutive runs of certain values

Here's a way, building on Joshua's rle approach: (EDITED to use seq_len and lapply as per Marek's suggestion)

> (!x) * unlist(lapply(rle(x)$lengths, seq_len))
[1] 0 1 0 1 2 3 0 0 1 2

UPDATE. Just for kicks, here's another way to do it, around 5 times faster:

cumul_zeros <- function(x)  {
x <- !x
rl <- rle(x)
len <- rl$lengths
v <- rl$values
cumLen <- cumsum(len)
z <- x
# replace the 0 at the end of each zero-block in z by the
# negative of the length of the preceding 1-block....
iDrops <- c(0, diff(v)) < 0
z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
# ... to ensure that the cumsum below does the right thing.
# We zap the cumsum with x so only the cumsums for the 1-blocks survive:
x*cumsum(z)
}

Try an example:

> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))
[1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0

Now compare times on a million-length vector:

> x <- sample(0:1, 1000000,T)
> system.time( z <- cumul_zeros(x))
user system elapsed
0.15 0.00 0.14
> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))
user system elapsed
0.75 0.00 0.75

Moral of the story: one-liners are nicer and easier to understand, but not always the fastest!

Create counter within consecutive runs of values

You need to use sequence and rle:

> sequence(rle(as.character(dataset$input))$lengths)
[1] 1 1 2 1 2 1 1 2 3 4 1 1

Create counter of consecutive runs of a certain value


SOG <- c(4,4,0,0,0,3,4,5,0,0,1,2,0,0,0)
#run length encoding:
tmp <- rle(SOG)
#turn values into logicals
tmp$values <- tmp$values == 0
#cumulative sum of TRUE values
tmp$values[tmp$values] <- cumsum(tmp$values[tmp$values])
#inverse the run length encoding
inverse.rle(tmp)
#[1] 0 0 1 1 1 0 0 0 2 2 0 0 3 3 3

Count and Assign Consecutive Occurrences of Variable

You can repeat the lengths argument lengths time in rle

with(rle(dataset$input), rep(lengths, lengths))
#[1] 1 2 2 2 2 1 4 4 4 4 1 1

Using dplyr, we can use lag to create groups and then count the number of rows in each group.

library(dplyr)

dataset %>%
group_by(gr = cumsum(input != lag(input, default = first(input)))) %>%
mutate(count = n())

and with data.table

library(data.table)
setDT(dataset)[, count:= .N, rleid(input)]

data

Make sure the input column is character and not factor.

dataset <- data.frame(input = c("a","b","b","a","a","c","a","a","a","a","b","c"),
stringsAsFactors = FALSE)

Count consecutive occurrences of a specific value in every row of a data frame in R

You've identified the two cases that the longest run can take: (1) somewhere int he middle or (2) split between the end and beginning of each row. Hence you want to calculate each condition and take the max like so:

df<-cbind(
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))

#> Winter Spring Summer Autumn
#> [1,] 0 0 0 3
#> [2,] 0 2 2 0
#> [3,] 3 4 7 4


# calculate the number of consecutive zeros at the start and end
startZeros <- apply(df,1,function(x)which.min(x==0)-1)
#> [1] 3 1 0
endZeros <- apply(df,1,function(x)which.min(rev(x==0))-1)
#> [1] 0 1 0

# calculate the longest run of zeros
longestRun <- apply(df,1,function(x){
y = rle(x);
max(y$lengths[y$values==0],0)}))
#> [1] 3 1 0

# take the max of the two values
pmax(longestRun,startZeros +endZeros )
#> [1] 3 2 0

Of course an even easier solution is:

longestRun  <-  apply(cbind(df,df),# tricky way to wrap the zeros from the start to the end
1,# the margin over which to apply the summary function
function(x){# the summary function
y = rle(x);
max(y$lengths[y$values==0],
0)#include zero incase there are no zeros in y$values
})

Note that the above solution works because my df does not include the location field (column).

R: count consecutive occurrences of values in a single column and by group

Use rleid (from the data.table package) to get a grouping variable and then use ave to apply seq_along within common values of that grouping:

library(data.table)
transform(dataset, Counter = ave(YesNO, rleid(ID, YesNO), FUN = seq_along))

giving:

   ID YesNO Counter
1 a 1 1
2 a 1 2
3 a 0 1
4 a 0 2
5 a 0 3
6 a 1 1
7 a 1 2
8 b 1 1
9 b 1 2
10 b 1 3
11 b 0 1
12 b 0 2
13 b 0 3
14 b 0 4

Count consecutive occurences of an element in string

To put the pieces together: here's a combination of my comment on your previous question and (parts of) my answer here: Count consecutive TRUE values within each block separately. The convenience functions rleid and rowid from the data.table package are used.

Toy data with two strings of different length:

s <- c("a > a > b > b > b > a > b > b", "c > c > b > b > b > c > c")

library(data.table)
lapply(strsplit(s, " > "), function(x) paste0(x, rowid(rleid(x)), collapse = " > "))
# [[1]]
# [1] "a1 > a2 > b1 > b2 > b3 > a1 > b1 > b2"
#
# [[2]]
# [1] "c1 > c2 > b1 > b2 > b3 > c1 > c2"

Calculate maximum length of consecutive values in row over a set number


# condition is that x should be larger or equal to 3
condition <- function(x) x >= 3

# example row
row = c(2,4,3,3,4,5,1,0,5,1)

# we can use condition on row:
condition(row)

# and we can emplay rle on that:
rle(condition(row))

# we need to filter those rle results for TRUE:
r <- rle(condition(row))
r$length[r$values == TRUE]

# The answer is the max of the latter
max(r$length[r$values])

or for your dataframe example

# condition is that x should be larger or equal to 3
condition <- \(x) x >= 3


number <- function(row, condition){
r <- row |>
condition() |>
rle()
max(r$length[r$values])
}

df <- replicate(10, sample(0:5, 10, rep=T))
apply(df, 1, number, condition)

Count the rows in a data table where a condition has been met consecutively


Counting consecutive occurrences (i.e. run length) of b for each ID through specified update_date

DT[order(ID, update_date), occurence := 1:.N, by = list(ID, rleid(b))]
DT
#> update_date ID b occurence
#> 1: 2022-01-01 aapl U1 1
#> 2: 2022-01-02 aapl U1 2
#> 3: 2022-01-03 aapl U1 3
#> 4: 2022-01-04 aapl U2 1
#> 5: 2022-01-05 aapl U2 2
#> 6: 2022-01-06 aapl U2 3
#> 7: 2022-01-01 ibm D1 1
#> 8: 2022-01-02 ibm D2 1
#> 9: 2022-01-03 ibm D1 1
#> 10: 2022-01-04 ibm D3 1
#> 11: 2022-01-05 ibm D2 1
#> 12: 2022-01-06 ibm D3 1

Counting occurrences of b for each ID through specified update_date

This includes occurrences that are non-consecutive.

#  Count of occurrences through present row
DT[order(ID, b, update_date), occurence := 1:.N, by = list(ID, b)]
DT
#> update_date ID b occurence
#> 1: 2022-01-01 aapl U1 1
#> 2: 2022-01-02 aapl U1 2
#> 3: 2022-01-03 aapl U1 3
#> 4: 2022-01-04 aapl U2 1
#> 5: 2022-01-05 aapl U2 2
#> 6: 2022-01-06 aapl U2 3
#> 7: 2022-01-01 ibm D1 1
#> 8: 2022-01-02 ibm D2 1
#> 9: 2022-01-03 ibm D1 2
#> 10: 2022-01-04 ibm D3 1
#> 11: 2022-01-05 ibm D2 2
#> 12: 2022-01-06 ibm D3 2


Related Topics



Leave a reply



Submit