Create counter within consecutive runs of certain values
Here's a way, building on Joshua's rle
approach: (EDITED to use seq_len
and lapply
as per Marek's suggestion)
> (!x) * unlist(lapply(rle(x)$lengths, seq_len))
[1] 0 1 0 1 2 3 0 0 1 2
UPDATE. Just for kicks, here's another way to do it, around 5 times faster:
cumul_zeros <- function(x) {
x <- !x
rl <- rle(x)
len <- rl$lengths
v <- rl$values
cumLen <- cumsum(len)
z <- x
# replace the 0 at the end of each zero-block in z by the
# negative of the length of the preceding 1-block....
iDrops <- c(0, diff(v)) < 0
z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
# ... to ensure that the cumsum below does the right thing.
# We zap the cumsum with x so only the cumsums for the 1-blocks survive:
x*cumsum(z)
}
Try an example:
> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))
[1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0
Now compare times on a million-length vector:
> x <- sample(0:1, 1000000,T)
> system.time( z <- cumul_zeros(x))
user system elapsed
0.15 0.00 0.14
> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))
user system elapsed
0.75 0.00 0.75
Moral of the story: one-liners are nicer and easier to understand, but not always the fastest!
Create counter within consecutive runs of values
You need to use sequence
and rle
:
> sequence(rle(as.character(dataset$input))$lengths)
[1] 1 1 2 1 2 1 1 2 3 4 1 1
Create counter of consecutive runs of a certain value
SOG <- c(4,4,0,0,0,3,4,5,0,0,1,2,0,0,0)
#run length encoding:
tmp <- rle(SOG)
#turn values into logicals
tmp$values <- tmp$values == 0
#cumulative sum of TRUE values
tmp$values[tmp$values] <- cumsum(tmp$values[tmp$values])
#inverse the run length encoding
inverse.rle(tmp)
#[1] 0 0 1 1 1 0 0 0 2 2 0 0 3 3 3
R: count consecutive occurrences of values in a single column and by group
Use rleid
(from the data.table package) to get a grouping variable and then use ave
to apply seq_along
within common values of that grouping:
library(data.table)
transform(dataset, Counter = ave(YesNO, rleid(ID, YesNO), FUN = seq_along))
giving:
ID YesNO Counter
1 a 1 1
2 a 1 2
3 a 0 1
4 a 0 2
5 a 0 3
6 a 1 1
7 a 1 2
8 b 1 1
9 b 1 2
10 b 1 3
11 b 0 1
12 b 0 2
13 b 0 3
14 b 0 4
Count and Assign Consecutive Occurrences of Variable
You can repeat the lengths
argument lengths
time in rle
with(rle(dataset$input), rep(lengths, lengths))
#[1] 1 2 2 2 2 1 4 4 4 4 1 1
Using dplyr
, we can use lag
to create groups and then count the number of rows in each group.
library(dplyr)
dataset %>%
group_by(gr = cumsum(input != lag(input, default = first(input)))) %>%
mutate(count = n())
and with data.table
library(data.table)
setDT(dataset)[, count:= .N, rleid(input)]
data
Make sure the input
column is character and not factor
.
dataset <- data.frame(input = c("a","b","b","a","a","c","a","a","a","a","b","c"),
stringsAsFactors = FALSE)
rstudio - Create counter in dataframe that gets reset based on changes in value or new ID
We can adjust the counter
values based on first
value of the group :
library(dplyr)
df %>%
group_by(ID, grp = cumsum(response == 1L)) %>%
mutate(counter = if(first(response) == 1L) row_number() - 1
else row_number()) %>%
ungroup() %>%
dplyr::select(-grp)
# A tibble: 24 x 3
# ID response counter
# <chr> <dbl> <dbl>
# 1 1 0 1
# 2 1 0 2
# 3 1 0 3
# 4 1 1 0
# 5 1 0 1
# 6 1 0 2
# 7 2 1 0
# 8 2 0 1
# 9 2 0 2
#10 2 0 3
# … with 14 more rows
Count consecutive occurrences of a specific value in every row of a data frame in R
You've identified the two cases that the longest run can take: (1) somewhere int he middle or (2) split between the end and beginning of each row. Hence you want to calculate each condition and take the max like so:
df<-cbind(
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))
#> Winter Spring Summer Autumn
#> [1,] 0 0 0 3
#> [2,] 0 2 2 0
#> [3,] 3 4 7 4
# calculate the number of consecutive zeros at the start and end
startZeros <- apply(df,1,function(x)which.min(x==0)-1)
#> [1] 3 1 0
endZeros <- apply(df,1,function(x)which.min(rev(x==0))-1)
#> [1] 0 1 0
# calculate the longest run of zeros
longestRun <- apply(df,1,function(x){
y = rle(x);
max(y$lengths[y$values==0],0)}))
#> [1] 3 1 0
# take the max of the two values
pmax(longestRun,startZeros +endZeros )
#> [1] 3 2 0
Of course an even easier solution is:
longestRun <- apply(cbind(df,df),# tricky way to wrap the zeros from the start to the end
1,# the margin over which to apply the summary function
function(x){# the summary function
y = rle(x);
max(y$lengths[y$values==0],
0)#include zero incase there are no zeros in y$values
})
Note that the above solution works because my df
does not include the location
field (column).
Add column with ascending numbers starting and ending based on certain values in other column
You can use purrr:accumulate()
here.
First create a logical vector with snow_depth !=0
, than call accumulate with if_else.
library(purrr)
library(dplyr)
df%>%mutate(consecutive_days=accumulate(snow_depth!=0, ~if_else(.y!=0, .x+1, 0)))
snow_depth new_column consecutive_days
1 0 0 0
2 0 0 0
3 5 1 1
4 7 2 2
5 8 3 3
6 4 4 4
7 0 0 0
8 0 0 0
9 6 1 1
10 5 2 2
11 8 3 3
12 9 4 4
13 5 5 5
14 6 6 6
15 0 0 0
16 8 1 1
17 6 2 2
data
df<-data.frame(snow_depth=c(0, 0, 5, 7, 8, 4, 0, 0, 6, 5, 8, 9, 5, 6, 0, 8, 6),
new_column=c(0, 0, 1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2))
Count consecutive occurences of an element in string
To put the pieces together: here's a combination of my comment on your previous question and (parts of) my answer here: Count consecutive TRUE values within each block separately. The convenience functions rleid
and rowid
from the data.table
package are used.
Toy data with two strings of different length:
s <- c("a > a > b > b > b > a > b > b", "c > c > b > b > b > c > c")
library(data.table)
lapply(strsplit(s, " > "), function(x) paste0(x, rowid(rleid(x)), collapse = " > "))
# [[1]]
# [1] "a1 > a2 > b1 > b2 > b3 > a1 > b1 > b2"
#
# [[2]]
# [1] "c1 > c2 > b1 > b2 > b3 > c1 > c2"
Related Topics
Access Lapply Index Names Inside Fun
Replace Na With Previous or Next Value, by Group, Using Dplyr
Adding a Column of Means by Group to Original Data
Order Data Frame Rows According to Vector With Specific Order
Data.Table Objects Assigned With := from Within Function Not Printed
How to Plot With 2 Different Y-Axes
Emulate Ggplot2 Default Color Palette
Remove 'A' from Legend When Using Aesthetics and Geom_Text
How to Convert Variable With Mixed Date Formats to One Format
Why Can't R'S Ifelse Statements Return Vectors
How to Read Multiple (Excel) Files into R
Filtering a Data Frame by Values in a Column
How to Convert a List Consisting of Vector of Different Lengths to a Usable Data Frame in R
Remove Quotes from a Character Vector in R
Complete Dataframe With Missing Combinations of Values
Adding Some Space Between the X-Axis and the Bars, in Ggplot