R: Selecting first of n consecutive rows above a certain threshold value
The easiest way is to use the zoo
library in conjunction with dplyr
. Within the zoo
package there is a function called rollapply
, we can use this to calculate a function value for a window of time.
In this example, we could apply the window to calculate the minimum of the next three values, and then apply the logic specified.
df %>% group_by(MRN) %>%
mutate(ANC=rollapply(ANC, width=3, min, align="left", fill=NA, na.rm=TRUE)) %>%
filter(ANC >= 0.5) %>%
filter(row_number() == 1)
# MRN Collected_Date ANC
# 1 001 2015-01-03 0.532
# 2 004 2014-01-03 0.500
In the code above we have used rollapply
to calculate the minimum of the next 3 items. To see how this works compare the following:
rollapply(1:6, width=3, min, align="left", fill=NA) # [1] 1 2 3 4 NA NA
rollapply(1:6, width=3, min, align="center", fill=NA) # [1] NA 1 2 3 4 NA
rollapply(1:6, width=3, min, align="right", fill=NA) # [1] NA NA 1 2 3 4
So in our example, we have aligned from the left, so it starts from the current location and looks forward to the next 2 values.
Lastly we filter by the appropriate values, and take the first observation of each group.
How to count consecutive occurrence below a threshold in R
Using rle
:
with(rle(s1$V2 < 1), sum(lengths[values] >= 3))
#[1] 1
How to search for a specific number of consecutive values above 0 in r?
We can use base R
with ave
and rle
subset(df, V2 >0 & ave(V2, with(rle(V2 > 0),
rep(seq_along(values), lengths)), FUN = length) >= 5 )
-output
# V1 V2
#4 A 0.500
#5 N 1.175
#6 S 3.100
#7 I 4.250
#8 I 2.250
#9 V 0.250
data
df <- structure(list(V1 = c("M", "A", "T", "A", "N", "S", "I", "I",
"V", "L"), V2 = c(1.2, -0.15, -0.8, 0.5, 1.175, 3.1, 4.25, 2.25,
0.25, -1.675)), class = "data.frame", row.names = c(NA, -10L))
Find three consecutive numbers greater than threshold group-wise in R
If it's above the threshold and it's the third such value in a row, capture the index in ends
. Select the first index in ends
and add one to get the index of the return time. (There may be more than 1 such group of 3 and therefore more than one element of ends
. In this case, the first end needs to be used.)
Note: In your example, the speed at return time is always above the threshold. This code does not check that as a condition at all, but simply gives the first time
after three rows with speeds above threshold (regardless of whether the speed at that time is still above the threshold).
library(data.table)
setDT(df)
speed_thresh <- 35
df[, {above <- Speed > speed_thresh
ends <- which(above & rowid(rleid(above)) == 3)
.(Return_Time = Time[ends[1] + 1])}
, Group]
# Group Return_Time
# 1: 1 35
# 2: 2 25
# 3: 3 NA
Data used:
df <- fread('
Group Time Speed
1 5 25
1 10 23
1 15 21
1 20 40
1 25 42
1 30 52
1 35 48
1 40 45
2 5 22
2 10 36
2 15 38
2 20 46
2 25 53
3 5 45
3 10 58
')
Find the first set of consecutive integers in a vector
This does it
with(rle(df$DV), values[which(lengths >= 5)[1]])
If there is no consecutive chunks with a length >= 5, you get NA
.
Remove if the consecutive value appears at end in the group id
I suggest the following:
library(dplyr)
data<-data.frame(id=c(1,1,1, 1,2,2,2,3,3,3, 3,4,4,4), a=c(1,1,1,1,1,2,1,1,2,2,1,1,1,2),
b=c("yes", "yes","no","no","no", "yes", "yes","no","yes","yes","no", "yes","yes","yes"))
data %>%
group_by(id) %>%
# create indicators for two consecutive 'yes'
mutate(prev_b = lag(b, 1),
two_yes = b == 'yes' & prev_b == 'yes') %>%
# create indicators for starting 'no'
mutate(ones = 1,
position = cumsum(ones),
prev_no = cumsum(ifelse(b == 'no', 1, 0)),
leading_no = position == prev_no) %>%
# create indicator for final record
mutate(next_b = lead(b, 1),
last_record = is.na(next_b)) %>%
# combine indicators at group level
mutate(group_end_two_yes = any(two_yes & last_record),
group_leading_no = any(leading_no)) %>%
# drop
mutate(drop_group = group_end_two_yes & group_leading_no) %>%
filter(!drop_group,
!leading_no) %>%
# select initial columns
select(id, a, b)
Extract first value after a specific observation
You can do :
library(dplyr)
df %>%
mutate(grp = cumsum(threshold != 'over')) %>%
filter(lag(threshold) == 'over' & lag(grp) != grp)
# values other.values threshold grp
#1 7 9 7 2
#2 4 5 4 6
Selecting all values above a threshold and then a random sample of the values below the threshold
I think you still need to rbind the results but you can do it in one line of code. I've used the sample_n function from dplyr for the sampling:
library(dplyr)
rbind(sample_n(cars[cars$speed<12,], 5), cars[cars$speed>=12,])
dplyr also has the rbind_list
function if you need something faster than rbind
.
Count consecutive prior dates per group
One way would be:
library(dplyr)
df %>%
group_by(customer, idx = cumsum(as.integer(c(0, diff(as.Date(date, '%d/%m/%y')))) != -1)) %>%
mutate(n_consecutive_days = rev(sequence(n()))) %>% ungroup() %>%
group_by(customer) %>%
mutate(n_consecutive_days = replace(n_consecutive_days, row_number() == n(), NA), idx = NULL)
Output:
# A tibble: 7 x 3
# Groups: customer [2]
customer date n_consecutive_days
<int> <fct> <int>
1 1 10/1/20 2
2 1 9/1/20 1
3 1 6/1/20 NA
4 2 10/1/20 1
5 2 8/1/20 3
6 2 7/1/20 2
7 2 6/1/20 NA
Related Topics
R Histogram with Multiple Populations
Multi Line Title in Ggplot 2 with Multiple Italicized Words
Filling Bars in Barplot with Textiles in Ggplot2
Rename Columns in Multiple Dataframes, R
How to Filter on Partial Match Using Sparklyr
Replace Missing Values with a Value from Another Column
Reduce Space Between Grid.Arrange Plots
Using R - Delete Rows When a Value Repeated Less Than 3 Times
Extracting Common Character Strings from Multiple Vectors of Different Lengths
Page Refresh Button in R Shiny
Twitter Emoji Encoding Problems with Twitter and R
Group Data in R for Consecutive Rows
Does Installing Blas/Atlas/Mkl/Openblas Will Speed Up R Package That Is Written in C/C++