Create counter within consecutive runs of values
You need to use sequence
and rle
:
> sequence(rle(as.character(dataset$input))$lengths)
[1] 1 1 2 1 2 1 1 2 3 4 1 1
Create counter within consecutive runs of certain values
Here's a way, building on Joshua's rle
approach: (EDITED to use seq_len
and lapply
as per Marek's suggestion)
> (!x) * unlist(lapply(rle(x)$lengths, seq_len))
[1] 0 1 0 1 2 3 0 0 1 2
UPDATE. Just for kicks, here's another way to do it, around 5 times faster:
cumul_zeros <- function(x) {
x <- !x
rl <- rle(x)
len <- rl$lengths
v <- rl$values
cumLen <- cumsum(len)
z <- x
# replace the 0 at the end of each zero-block in z by the
# negative of the length of the preceding 1-block....
iDrops <- c(0, diff(v)) < 0
z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
# ... to ensure that the cumsum below does the right thing.
# We zap the cumsum with x so only the cumsums for the 1-blocks survive:
x*cumsum(z)
}
Try an example:
> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))
[1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0
Now compare times on a million-length vector:
> x <- sample(0:1, 1000000,T)
> system.time( z <- cumul_zeros(x))
user system elapsed
0.15 0.00 0.14
> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))
user system elapsed
0.75 0.00 0.75
Moral of the story: one-liners are nicer and easier to understand, but not always the fastest!
Create counter of consecutive runs of a certain value
SOG <- c(4,4,0,0,0,3,4,5,0,0,1,2,0,0,0)
#run length encoding:
tmp <- rle(SOG)
#turn values into logicals
tmp$values <- tmp$values == 0
#cumulative sum of TRUE values
tmp$values[tmp$values] <- cumsum(tmp$values[tmp$values])
#inverse the run length encoding
inverse.rle(tmp)
#[1] 0 0 1 1 1 0 0 0 2 2 0 0 3 3 3
R: count consecutive occurrences of values in a single column and by group
Use rleid
(from the data.table package) to get a grouping variable and then use ave
to apply seq_along
within common values of that grouping:
library(data.table)
transform(dataset, Counter = ave(YesNO, rleid(ID, YesNO), FUN = seq_along))
giving:
ID YesNO Counter
1 a 1 1
2 a 1 2
3 a 0 1
4 a 0 2
5 a 0 3
6 a 1 1
7 a 1 2
8 b 1 1
9 b 1 2
10 b 1 3
11 b 0 1
12 b 0 2
13 b 0 3
14 b 0 4
Count and Assign Consecutive Occurrences of Variable
You can repeat the lengths
argument lengths
time in rle
with(rle(dataset$input), rep(lengths, lengths))
#[1] 1 2 2 2 2 1 4 4 4 4 1 1
Using dplyr
, we can use lag
to create groups and then count the number of rows in each group.
library(dplyr)
dataset %>%
group_by(gr = cumsum(input != lag(input, default = first(input)))) %>%
mutate(count = n())
and with data.table
library(data.table)
setDT(dataset)[, count:= .N, rleid(input)]
data
Make sure the input
column is character and not factor
.
dataset <- data.frame(input = c("a","b","b","a","a","c","a","a","a","a","b","c"),
stringsAsFactors = FALSE)
Count cumulative and sequential values of the same sign in R
You can try:
library(dplyr)
df %>%
mutate(z = with(rle(sign(x)), sequence(lengths) * rep(values, lengths)))
x z
1 0.5 1
2 1.0 2
3 6.5 3
4 -2.0 -1
5 3.0 1
6 -0.2 -1
7 -1.0 -2
You may want to consider how zeroes should be treated as the above may need a modification if zeroes exist in your vector. Perhaps:
df %>%
mutate(z = with(rle(sign(x)), sequence(lengths) * rep(values^(values != 0), lengths)))
Edit addressing OP comment below:
df %>%
mutate(z = with(tmp <- rle(sign(x)), sequence(lengths) * rep(values, lengths)),
id = with(tmp, rep(seq_along(lengths), lengths))) %>%
group_by(id) %>%
mutate(avg = cumsum(x)/row_number()) %>%
ungroup() %>%
select(-id)
# A tibble: 7 x 3
x z avg
<dbl> <dbl> <dbl>
1 0.5 1 0.5
2 1 2 0.75
3 6.5 3 2.67
4 -2 -1 -2
5 3 1 3
6 -0.2 -1 -0.2
7 -1 -2 -0.6
Count consecutive occurrences of a specific value in every row of a data frame in R
You've identified the two cases that the longest run can take: (1) somewhere int he middle or (2) split between the end and beginning of each row. Hence you want to calculate each condition and take the max like so:
df<-cbind(
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))
#> Winter Spring Summer Autumn
#> [1,] 0 0 0 3
#> [2,] 0 2 2 0
#> [3,] 3 4 7 4
# calculate the number of consecutive zeros at the start and end
startZeros <- apply(df,1,function(x)which.min(x==0)-1)
#> [1] 3 1 0
endZeros <- apply(df,1,function(x)which.min(rev(x==0))-1)
#> [1] 0 1 0
# calculate the longest run of zeros
longestRun <- apply(df,1,function(x){
y = rle(x);
max(y$lengths[y$values==0],0)}))
#> [1] 3 1 0
# take the max of the two values
pmax(longestRun,startZeros +endZeros )
#> [1] 3 2 0
Of course an even easier solution is:
longestRun <- apply(cbind(df,df),# tricky way to wrap the zeros from the start to the end
1,# the margin over which to apply the summary function
function(x){# the summary function
y = rle(x);
max(y$lengths[y$values==0],
0)#include zero incase there are no zeros in y$values
})
Note that the above solution works because my df
does not include the location
field (column).
rstudio - Create counter in dataframe that gets reset based on changes in value or new ID
We can adjust the counter
values based on first
value of the group :
library(dplyr)
df %>%
group_by(ID, grp = cumsum(response == 1L)) %>%
mutate(counter = if(first(response) == 1L) row_number() - 1
else row_number()) %>%
ungroup() %>%
dplyr::select(-grp)
# A tibble: 24 x 3
# ID response counter
# <chr> <dbl> <dbl>
# 1 1 0 1
# 2 1 0 2
# 3 1 0 3
# 4 1 1 0
# 5 1 0 1
# 6 1 0 2
# 7 2 1 0
# 8 2 0 1
# 9 2 0 2
#10 2 0 3
# … with 14 more rows
Count consecutive occurences of an element in string
To put the pieces together: here's a combination of my comment on your previous question and (parts of) my answer here: Count consecutive TRUE values within each block separately. The convenience functions rleid
and rowid
from the data.table
package are used.
Toy data with two strings of different length:
s <- c("a > a > b > b > b > a > b > b", "c > c > b > b > b > c > c")
library(data.table)
lapply(strsplit(s, " > "), function(x) paste0(x, rowid(rleid(x)), collapse = " > "))
# [[1]]
# [1] "a1 > a2 > b1 > b2 > b3 > a1 > b1 > b2"
#
# [[2]]
# [1] "c1 > c2 > b1 > b2 > b3 > c1 > c2"
Related Topics
Adding a New Column Based Upon Values in Another Column Using Dplyr
R: Error in Usemethod("Tbl_Vars")
Append Data Frames Together in a for Loop
How to Fix Spaces in Column Names of a Data.Frame (Remove Spaces, Inject Dots)
Sort (Order) Data Frame Rows by Multiple Columns
Why Is '[' Better Than 'Subset'
Difference Between Require() and Library()
Paste Multiple Columns Together
Filtering a Data Frame by Values in a Column
Divide All Columns by the Value from the 2Nd Column - Apply for All Rows
Regex to Replace Comma to Dot Separator
Loop Through Data Frame and Variable Names
Removing Space Between Numeric Values in R
Error: Could Not Find Function ... in R
Showing Data Values on Stacked Bar Chart in Ggplot2