R: Selecting First of N Consecutive Rows Above a Certain Threshold Value

R: Selecting first of n consecutive rows above a certain threshold value

The easiest way is to use the zoo library in conjunction with dplyr. Within the zoo package there is a function called rollapply, we can use this to calculate a function value for a window of time.

In this example, we could apply the window to calculate the minimum of the next three values, and then apply the logic specified.

df %>% group_by(MRN) %>%
mutate(ANC=rollapply(ANC, width=3, min, align="left", fill=NA, na.rm=TRUE)) %>%
filter(ANC >= 0.5) %>%
filter(row_number() == 1)

# MRN Collected_Date ANC
# 1 001 2015-01-03 0.532
# 2 004 2014-01-03 0.500

In the code above we have used rollapply to calculate the minimum of the next 3 items. To see how this works compare the following:

rollapply(1:6, width=3, min, align="left", fill=NA) # [1]  1  2  3  4 NA NA
rollapply(1:6, width=3, min, align="center", fill=NA) # [1] NA 1 2 3 4 NA
rollapply(1:6, width=3, min, align="right", fill=NA) # [1] NA NA 1 2 3 4

So in our example, we have aligned from the left, so it starts from the current location and looks forward to the next 2 values.

Lastly we filter by the appropriate values, and take the first observation of each group.

How to count consecutive occurrence below a threshold in R

Using rle :

with(rle(s1$V2 < 1), sum(lengths[values] >= 3))
#[1] 1

How to search for a specific number of consecutive values above 0 in r?

We can use base R with ave and rle

subset(df, V2 >0 & ave(V2, with(rle(V2 > 0),
rep(seq_along(values), lengths)), FUN = length) >= 5 )

-output

#   V1    V2
#4 A 0.500
#5 N 1.175
#6 S 3.100
#7 I 4.250
#8 I 2.250
#9 V 0.250

data

df <- structure(list(V1 = c("M", "A", "T", "A", "N", "S", "I", "I", 
"V", "L"), V2 = c(1.2, -0.15, -0.8, 0.5, 1.175, 3.1, 4.25, 2.25,
0.25, -1.675)), class = "data.frame", row.names = c(NA, -10L))

Find three consecutive numbers greater than threshold group-wise in R

If it's above the threshold and it's the third such value in a row, capture the index in ends. Select the first index in ends and add one to get the index of the return time. (There may be more than 1 such group of 3 and therefore more than one element of ends. In this case, the first end needs to be used.)

Note: In your example, the speed at return time is always above the threshold. This code does not check that as a condition at all, but simply gives the first time after three rows with speeds above threshold (regardless of whether the speed at that time is still above the threshold).

library(data.table)
setDT(df)

speed_thresh <- 35

df[, {above <- Speed > speed_thresh
ends <- which(above & rowid(rleid(above)) == 3)
.(Return_Time = Time[ends[1] + 1])}
, Group]

# Group Return_Time
# 1: 1 35
# 2: 2 25
# 3: 3 NA

Data used:

df <- fread('
Group Time Speed
1 5 25
1 10 23
1 15 21
1 20 40
1 25 42
1 30 52
1 35 48
1 40 45
2 5 22
2 10 36
2 15 38
2 20 46
2 25 53
3 5 45
3 10 58
')

Find the first set of consecutive integers in a vector

This does it

with(rle(df$DV), values[which(lengths >= 5)[1]])

If there is no consecutive chunks with a length >= 5, you get NA.

Remove if the consecutive value appears at end in the group id

I suggest the following:

library(dplyr)

data<-data.frame(id=c(1,1,1, 1,2,2,2,3,3,3, 3,4,4,4), a=c(1,1,1,1,1,2,1,1,2,2,1,1,1,2),
b=c("yes", "yes","no","no","no", "yes", "yes","no","yes","yes","no", "yes","yes","yes"))

data %>%
group_by(id) %>%
# create indicators for two consecutive 'yes'
mutate(prev_b = lag(b, 1),
two_yes = b == 'yes' & prev_b == 'yes') %>%
# create indicators for starting 'no'
mutate(ones = 1,
position = cumsum(ones),
prev_no = cumsum(ifelse(b == 'no', 1, 0)),
leading_no = position == prev_no) %>%
# create indicator for final record
mutate(next_b = lead(b, 1),
last_record = is.na(next_b)) %>%
# combine indicators at group level
mutate(group_end_two_yes = any(two_yes & last_record),
group_leading_no = any(leading_no)) %>%
# drop
mutate(drop_group = group_end_two_yes & group_leading_no) %>%
filter(!drop_group,
!leading_no) %>%
# select initial columns
select(id, a, b)

Extract first value after a specific observation

You can do :

library(dplyr)

df %>%
mutate(grp = cumsum(threshold != 'over')) %>%
filter(lag(threshold) == 'over' & lag(grp) != grp)

# values other.values threshold grp
#1 7 9 7 2
#2 4 5 4 6

Selecting all values above a threshold and then a random sample of the values below the threshold

I think you still need to rbind the results but you can do it in one line of code. I've used the sample_n function from dplyr for the sampling:

library(dplyr)

rbind(sample_n(cars[cars$speed<12,], 5), cars[cars$speed>=12,])

dplyr also has the rbind_list function if you need something faster than rbind.

Count consecutive prior dates per group

One way would be:

library(dplyr)

df %>%
group_by(customer, idx = cumsum(as.integer(c(0, diff(as.Date(date, '%d/%m/%y')))) != -1)) %>%
mutate(n_consecutive_days = rev(sequence(n()))) %>% ungroup() %>%
group_by(customer) %>%
mutate(n_consecutive_days = replace(n_consecutive_days, row_number() == n(), NA), idx = NULL)

Output:

# A tibble: 7 x 3
# Groups: customer [2]
customer date n_consecutive_days
<int> <fct> <int>
1 1 10/1/20 2
2 1 9/1/20 1
3 1 6/1/20 NA
4 2 10/1/20 1
5 2 8/1/20 3
6 2 7/1/20 2
7 2 6/1/20 NA


Related Topics



Leave a reply



Submit