Create Counter Within Consecutive Runs of Certain Values

Create counter within consecutive runs of certain values

Here's a way, building on Joshua's rle approach: (EDITED to use seq_len and lapply as per Marek's suggestion)

> (!x) * unlist(lapply(rle(x)$lengths, seq_len))
[1] 0 1 0 1 2 3 0 0 1 2

UPDATE. Just for kicks, here's another way to do it, around 5 times faster:

cumul_zeros <- function(x)  {
x <- !x
rl <- rle(x)
len <- rl$lengths
v <- rl$values
cumLen <- cumsum(len)
z <- x
# replace the 0 at the end of each zero-block in z by the
# negative of the length of the preceding 1-block....
iDrops <- c(0, diff(v)) < 0
z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
# ... to ensure that the cumsum below does the right thing.
# We zap the cumsum with x so only the cumsums for the 1-blocks survive:
x*cumsum(z)
}

Try an example:

> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))
[1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0

Now compare times on a million-length vector:

> x <- sample(0:1, 1000000,T)
> system.time( z <- cumul_zeros(x))
user system elapsed
0.15 0.00 0.14
> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))
user system elapsed
0.75 0.00 0.75

Moral of the story: one-liners are nicer and easier to understand, but not always the fastest!

Create counter within consecutive runs of values

You need to use sequence and rle:

> sequence(rle(as.character(dataset$input))$lengths)
[1] 1 1 2 1 2 1 1 2 3 4 1 1

Create counter of consecutive runs of a certain value

SOG <- c(4,4,0,0,0,3,4,5,0,0,1,2,0,0,0)
#run length encoding:
tmp <- rle(SOG)
#turn values into logicals
tmp$values <- tmp$values == 0
#cumulative sum of TRUE values
tmp$values[tmp$values] <- cumsum(tmp$values[tmp$values])
#inverse the run length encoding
inverse.rle(tmp)
#[1] 0 0 1 1 1 0 0 0 2 2 0 0 3 3 3

R: count consecutive occurrences of values in a single column and by group

Use rleid (from the data.table package) to get a grouping variable and then use ave to apply seq_along within common values of that grouping:

library(data.table)
transform(dataset, Counter = ave(YesNO, rleid(ID, YesNO), FUN = seq_along))

giving:

   ID YesNO Counter
1 a 1 1
2 a 1 2
3 a 0 1
4 a 0 2
5 a 0 3
6 a 1 1
7 a 1 2
8 b 1 1
9 b 1 2
10 b 1 3
11 b 0 1
12 b 0 2
13 b 0 3
14 b 0 4

Count and Assign Consecutive Occurrences of Variable

You can repeat the lengths argument lengths time in rle

with(rle(dataset$input), rep(lengths, lengths))
#[1] 1 2 2 2 2 1 4 4 4 4 1 1

Using dplyr, we can use lag to create groups and then count the number of rows in each group.

library(dplyr)

dataset %>%
group_by(gr = cumsum(input != lag(input, default = first(input)))) %>%
mutate(count = n())

and with data.table

library(data.table)
setDT(dataset)[, count:= .N, rleid(input)]

data

Make sure the input column is character and not factor.

dataset <- data.frame(input = c("a","b","b","a","a","c","a","a","a","a","b","c"),
stringsAsFactors = FALSE)

rstudio - Create counter in dataframe that gets reset based on changes in value or new ID

We can adjust the counter values based on first value of the group :

library(dplyr)

df %>%
group_by(ID, grp = cumsum(response == 1L)) %>%
mutate(counter = if(first(response) == 1L) row_number() - 1
else row_number()) %>%
ungroup() %>%
dplyr::select(-grp)

# A tibble: 24 x 3
# ID response counter
# <chr> <dbl> <dbl>
# 1 1 0 1
# 2 1 0 2
# 3 1 0 3
# 4 1 1 0
# 5 1 0 1
# 6 1 0 2
# 7 2 1 0
# 8 2 0 1
# 9 2 0 2
#10 2 0 3
# … with 14 more rows

Count consecutive occurrences of a specific value in every row of a data frame in R

You've identified the two cases that the longest run can take: (1) somewhere int he middle or (2) split between the end and beginning of each row. Hence you want to calculate each condition and take the max like so:

df<-cbind(
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))

#> Winter Spring Summer Autumn
#> [1,] 0 0 0 3
#> [2,] 0 2 2 0
#> [3,] 3 4 7 4


# calculate the number of consecutive zeros at the start and end
startZeros <- apply(df,1,function(x)which.min(x==0)-1)
#> [1] 3 1 0
endZeros <- apply(df,1,function(x)which.min(rev(x==0))-1)
#> [1] 0 1 0

# calculate the longest run of zeros
longestRun <- apply(df,1,function(x){
y = rle(x);
max(y$lengths[y$values==0],0)}))
#> [1] 3 1 0

# take the max of the two values
pmax(longestRun,startZeros +endZeros )
#> [1] 3 2 0

Of course an even easier solution is:

longestRun  <-  apply(cbind(df,df),# tricky way to wrap the zeros from the start to the end
1,# the margin over which to apply the summary function
function(x){# the summary function
y = rle(x);
max(y$lengths[y$values==0],
0)#include zero incase there are no zeros in y$values
})

Note that the above solution works because my df does not include the location field (column).

Add column with ascending numbers starting and ending based on certain values in other column

You can use purrr:accumulate() here.
First create a logical vector with snow_depth !=0, than call accumulate with if_else.

library(purrr)
library(dplyr)

df%>%mutate(consecutive_days=accumulate(snow_depth!=0, ~if_else(.y!=0, .x+1, 0)))

snow_depth new_column consecutive_days
1 0 0 0
2 0 0 0
3 5 1 1
4 7 2 2
5 8 3 3
6 4 4 4
7 0 0 0
8 0 0 0
9 6 1 1
10 5 2 2
11 8 3 3
12 9 4 4
13 5 5 5
14 6 6 6
15 0 0 0
16 8 1 1
17 6 2 2

data

df<-data.frame(snow_depth=c(0, 0, 5, 7, 8, 4, 0, 0, 6, 5, 8, 9, 5, 6, 0, 8, 6),
new_column=c(0, 0, 1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2))

Count consecutive occurences of an element in string

To put the pieces together: here's a combination of my comment on your previous question and (parts of) my answer here: Count consecutive TRUE values within each block separately. The convenience functions rleid and rowid from the data.table package are used.

Toy data with two strings of different length:

s <- c("a > a > b > b > b > a > b > b", "c > c > b > b > b > c > c")

library(data.table)
lapply(strsplit(s, " > "), function(x) paste0(x, rowid(rleid(x)), collapse = " > "))
# [[1]]
# [1] "a1 > a2 > b1 > b2 > b3 > a1 > b1 > b2"
#
# [[2]]
# [1] "c1 > c2 > b1 > b2 > b3 > c1 > c2"


Related Topics



Leave a reply



Submit