Create Counter Within Consecutive Runs of Values

Create counter within consecutive runs of values

You need to use sequence and rle:

> sequence(rle(as.character(dataset$input))$lengths)
[1] 1 1 2 1 2 1 1 2 3 4 1 1

Create counter within consecutive runs of certain values

Here's a way, building on Joshua's rle approach: (EDITED to use seq_len and lapply as per Marek's suggestion)

> (!x) * unlist(lapply(rle(x)$lengths, seq_len))
[1] 0 1 0 1 2 3 0 0 1 2

UPDATE. Just for kicks, here's another way to do it, around 5 times faster:

cumul_zeros <- function(x)  {
x <- !x
rl <- rle(x)
len <- rl$lengths
v <- rl$values
cumLen <- cumsum(len)
z <- x
# replace the 0 at the end of each zero-block in z by the
# negative of the length of the preceding 1-block....
iDrops <- c(0, diff(v)) < 0
z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
# ... to ensure that the cumsum below does the right thing.
# We zap the cumsum with x so only the cumsums for the 1-blocks survive:
x*cumsum(z)
}

Try an example:

> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))
[1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0

Now compare times on a million-length vector:

> x <- sample(0:1, 1000000,T)
> system.time( z <- cumul_zeros(x))
user system elapsed
0.15 0.00 0.14
> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))
user system elapsed
0.75 0.00 0.75

Moral of the story: one-liners are nicer and easier to understand, but not always the fastest!

Create counter of consecutive runs of a certain value

SOG <- c(4,4,0,0,0,3,4,5,0,0,1,2,0,0,0)
#run length encoding:
tmp <- rle(SOG)
#turn values into logicals
tmp$values <- tmp$values == 0
#cumulative sum of TRUE values
tmp$values[tmp$values] <- cumsum(tmp$values[tmp$values])
#inverse the run length encoding
inverse.rle(tmp)
#[1] 0 0 1 1 1 0 0 0 2 2 0 0 3 3 3

R: count consecutive occurrences of values in a single column and by group

Use rleid (from the data.table package) to get a grouping variable and then use ave to apply seq_along within common values of that grouping:

library(data.table)
transform(dataset, Counter = ave(YesNO, rleid(ID, YesNO), FUN = seq_along))

giving:

   ID YesNO Counter
1 a 1 1
2 a 1 2
3 a 0 1
4 a 0 2
5 a 0 3
6 a 1 1
7 a 1 2
8 b 1 1
9 b 1 2
10 b 1 3
11 b 0 1
12 b 0 2
13 b 0 3
14 b 0 4

Count and Assign Consecutive Occurrences of Variable

You can repeat the lengths argument lengths time in rle

with(rle(dataset$input), rep(lengths, lengths))
#[1] 1 2 2 2 2 1 4 4 4 4 1 1

Using dplyr, we can use lag to create groups and then count the number of rows in each group.

library(dplyr)

dataset %>%
group_by(gr = cumsum(input != lag(input, default = first(input)))) %>%
mutate(count = n())

and with data.table

library(data.table)
setDT(dataset)[, count:= .N, rleid(input)]

data

Make sure the input column is character and not factor.

dataset <- data.frame(input = c("a","b","b","a","a","c","a","a","a","a","b","c"),
stringsAsFactors = FALSE)

Count cumulative and sequential values of the same sign in R

You can try:

library(dplyr)

df %>%
mutate(z = with(rle(sign(x)), sequence(lengths) * rep(values, lengths)))

x z
1 0.5 1
2 1.0 2
3 6.5 3
4 -2.0 -1
5 3.0 1
6 -0.2 -1
7 -1.0 -2

You may want to consider how zeroes should be treated as the above may need a modification if zeroes exist in your vector. Perhaps:

df %>%
mutate(z = with(rle(sign(x)), sequence(lengths) * rep(values^(values != 0), lengths)))

Edit addressing OP comment below:

df %>%
mutate(z = with(tmp <- rle(sign(x)), sequence(lengths) * rep(values, lengths)),
id = with(tmp, rep(seq_along(lengths), lengths))) %>%
group_by(id) %>%
mutate(avg = cumsum(x)/row_number()) %>%
ungroup() %>%
select(-id)

# A tibble: 7 x 3
x z avg
<dbl> <dbl> <dbl>
1 0.5 1 0.5
2 1 2 0.75
3 6.5 3 2.67
4 -2 -1 -2
5 3 1 3
6 -0.2 -1 -0.2
7 -1 -2 -0.6

Count consecutive occurrences of a specific value in every row of a data frame in R

You've identified the two cases that the longest run can take: (1) somewhere int he middle or (2) split between the end and beginning of each row. Hence you want to calculate each condition and take the max like so:

df<-cbind(
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))

#> Winter Spring Summer Autumn
#> [1,] 0 0 0 3
#> [2,] 0 2 2 0
#> [3,] 3 4 7 4


# calculate the number of consecutive zeros at the start and end
startZeros <- apply(df,1,function(x)which.min(x==0)-1)
#> [1] 3 1 0
endZeros <- apply(df,1,function(x)which.min(rev(x==0))-1)
#> [1] 0 1 0

# calculate the longest run of zeros
longestRun <- apply(df,1,function(x){
y = rle(x);
max(y$lengths[y$values==0],0)}))
#> [1] 3 1 0

# take the max of the two values
pmax(longestRun,startZeros +endZeros )
#> [1] 3 2 0

Of course an even easier solution is:

longestRun  <-  apply(cbind(df,df),# tricky way to wrap the zeros from the start to the end
1,# the margin over which to apply the summary function
function(x){# the summary function
y = rle(x);
max(y$lengths[y$values==0],
0)#include zero incase there are no zeros in y$values
})

Note that the above solution works because my df does not include the location field (column).

rstudio - Create counter in dataframe that gets reset based on changes in value or new ID

We can adjust the counter values based on first value of the group :

library(dplyr)

df %>%
group_by(ID, grp = cumsum(response == 1L)) %>%
mutate(counter = if(first(response) == 1L) row_number() - 1
else row_number()) %>%
ungroup() %>%
dplyr::select(-grp)

# A tibble: 24 x 3
# ID response counter
# <chr> <dbl> <dbl>
# 1 1 0 1
# 2 1 0 2
# 3 1 0 3
# 4 1 1 0
# 5 1 0 1
# 6 1 0 2
# 7 2 1 0
# 8 2 0 1
# 9 2 0 2
#10 2 0 3
# … with 14 more rows

Count consecutive occurences of an element in string

To put the pieces together: here's a combination of my comment on your previous question and (parts of) my answer here: Count consecutive TRUE values within each block separately. The convenience functions rleid and rowid from the data.table package are used.

Toy data with two strings of different length:

s <- c("a > a > b > b > b > a > b > b", "c > c > b > b > b > c > c")

library(data.table)
lapply(strsplit(s, " > "), function(x) paste0(x, rowid(rleid(x)), collapse = " > "))
# [[1]]
# [1] "a1 > a2 > b1 > b2 > b3 > a1 > b1 > b2"
#
# [[2]]
# [1] "c1 > c2 > b1 > b2 > b3 > c1 > c2"


Related Topics



Leave a reply



Submit