Find consecutive sequence of zeros in R
Using data.table
, as your question suggests you actually want to, as far I a can see, this is doing what you want
DT <- data.table(myOriginalDf)
# add the original order, so you can't lose it
DT[, orig := .I]
# rle by id, saving the length as a new variables
DT[, rleLength := {rr <- rle(value); rep(rr$length, rr$length)}, by = 'id']
# key by value and length to subset
setkey(DT, value, rleLength)
# which rows are value = 0 and length > 2
DT[list(0, unique(rleLength[rleLength>2])),nomatch=0]
## value rleLength id orig
## 1: 0 3 x 6
## 2: 0 3 x 7
## 3: 0 3 x 8
## 4: 0 4 y 10
## 5: 0 4 y 11
## 6: 0 4 y 12
## 7: 0 4 y 13
How to find the indices where there are n consecutive zeroes in a row
Here are two base R approaches:
1) rle First run rle
and then compute ok
to pick out the sequences of zeros that are more than 3 long. We then compute the starts
and ends
of all repeated sequences subsetting to the ok
ones at the end.
with(rle(x), {
ok <- values == 0 & lengths > 3
ends <- cumsum(lengths)
starts <- ends - lengths + 1
data.frame(starts, ends)[ok, ]
})
giving:
starts ends
1 6 17
2 34 58
3 72 89
2) gregexpr Take the sign of each number -- that will be 0 or 1 and then concatenate those into a long string. Then use gregexpr
to find the location of at least 4 zeros. The result gives the starts and the ends can be computed from that plus the match.length
attribute minus 1.
s <- paste(sign(x), collapse = "")
g <- gregexpr("0{4,}", s)[[1]]
data.frame(starts = 0, ends = attr(g, "match.length") - 1) + g
giving:
starts ends
1 6 17
2 34 58
3 72 89
Find distribution of consecutive zeros
1) We can use rleid
from data.table
data.table(x)[, strrep(0, sum(x==0)) ,rleid(x == 0)][V1 != "",.N , V1]
# V1 N
#1: 0 3
#2: 00 2
#3: 000 1
2) or we can use tidyverse
library(tidyverse)
tibble(x) %>%
group_by(grp = cumsum(x != 0)) %>%
filter(x == 0) %>%
count(grp) %>%
ungroup %>%
count(n)
# A tibble: 3 x 2
# n nn
# <int> <int>
#1 1 3
#2 2 2
#3 3 1
3) Or we can use tabulate
with rleid
tabulate(tabulate(rleid(x)[x==0]))
#[1] 3 2 1
Benchmarks
By checking with system.time
on @SymbolixAU's dataset
system.time({
tabulate(tabulate(rleid(x2)[x2==0]))
})
# user system elapsed
# 0.03 0.00 0.03
Comparing with the Rcpp
function, the above is not that bad
system.time({
m <- zeroPattern(x2)
m[m[,2] > 0, ]
})
# user system elapsed
# 0.01 0.01 0.03
With microbenchmark
, removed the methods that are consuming more time (based on @SymbolixAU's comparisons) and initiated a new comparison. Note that here also, it is not exactly apples to apples but it is still a lot more similar as in the previous comparison there is an overhead of data.table
along with some formatting to replicate the OP's expected output
microbenchmark(
akrun = {
tabulate(tabulate(rleid(x2)[x2==0]))
},
G = {
with(rle(x2), table(lengths[values == 0]))
},
sym = {
m <- zeroPattern(x2)
m[m[,2] > 0, ]
},
times = 5, unit = "relative"
)
#Unit: relative
# expr min lq mean median uq max neval cld
# akrun 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 5 a
# G 6.049181 8.272782 5.353175 8.106543 7.527412 2.905924 5 b
# sym 1.385976 1.338845 1.661294 1.399635 3.845435 1.211131 5 a
Maximum of a consecutive sequence in a column with zeros
Here is one way to do it:
library(dplyr)
test %>%
group_by(id = data.table::rleid(vals)) %>%
summarise(max = ifelse(sum(vals) != 0,
list(max(cumsum, na.rm = TRUE)),
list(NULL))
) %>%
pull(max) %>%
unlist
#> [1] 3 3 1
# the data
id = 1:16
vals = c(0,1,1,1,0,0,0,0,1,1,1,0,0,0,1,0)
cumsum = c(0, 1, 2, 3, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 1, 0)
test = data.frame(id,vals, cumsum)
Created on 2021-08-16 by the reprex package (v2.0.1)
How to count consecutive zero in last run?
Reverse a
and then compute its cumulative sum. The leading 0's will be the only 0's left and ! of that will be TRUE for each and FALSE for other elements. The sum of that is the desired number.
sum(!cumsum(rev(a)))
Find consecutive zeroes in a row
#Had to fix Client 4, one number was missing
DF <- read.table(text = 'Clients Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
"Client 1" 123 768 678 452 213 123 55 10 0 0 0 0
"Client 2" 549 542 21 321 31 59 998 0 546 980 0 987
"Client 3" 500 0 500 0 500 0 500 0 500 0 500 0
"Client 4" 126 545 2315 27 268 126 56 0 0 0 0 0
"Client 5" 546 546 0 0 0 328 486 326 0 0 66 0
"Client 6" 0 0 0 25 78 563 698 631 230 53 0 0', header = TRUE)
Loop over rows, reverse the order, and find which entry is the first non-zero; if the client never head a transaction return length(x)
:
n <- apply(DF[, -1], 1, function(x) if (any(x)) which.max(rev(x) != 0) - 1 else length(x))
#[1] 4 0 1 5 1 2
DF$Clients[n >= 3]
#[1] Client 1 Client 4
#Levels: Client 1 Client 2 Client 3 Client 4 Client 5 Client 6
Finding the first number after consecutive zeros in data frame
We can use rle
to select the first row after first consecutive zeroes in each group (ID
).
library(dplyr)
data %>%
group_by(ID) %>%
slice(with(rle(event == 0), sum(lengths[1:which.max(values)])) + 1)
# ID time event
# <int> <int> <dbl>
#1 1 8 1
#2 2 6 1
Count of consecutive zeros in a dataframe
Solution using rle
:
getConsecZeroRle <- function(x) {
foo <- rle(x)
foo$lengths[tail(which(foo$values), 1)]
}
result <- apply(df[, -1] == 0, 1, function(x) getConsecZeroRle(x))
df$test <- as.numeric(result)
df$test[is.na(df$test)] <- 0
Explanation:
Use apply
to iterate over the subset of your dataframe. For each row calculate length of consecutive zeros (rle
) and extract last value using tail
. Rows that don't have zeros will produce NA
(using is.na(df$test)
) to replace them with zeros.
Solution using sum
:
getConsecZeroSum <- function(x) {
x[1:tail(which(!x), 1)] <- FALSE
sum(x)
}
df$test <- apply(df[, -1] == 0, 1, function(x) getConsecZeroSum(x))
Explanation:
Extract last FALSE
value in each row and turn everything to FALSE
before it (x[1:tail(which(!x), 1)] <- FALSE
) then use sum
to count zero values from the end.
Result:
# a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 test
# 1 row1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 8
# 2 row2 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1
Change zero to ones in vector if surrounded by less than five consecutive zeros
A possible solution with rle
which does not change shorts sequences of zero's at the beginning or end of x
:
# create the run length encoding
r <- rle(x)
# create an index of which zero's should be changed
i <- r$values == 0 & r$lengths < 5 &
c(tail(r$values, -1) == 1, FALSE) &
c(FALSE, head(r$values, -1) == 1)
# set the appropriate values to 1
r$values[i] <- 1
# use the inverse of rle to recreate the vector
inverse.rle(r)
which gives:
[1] 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1
tidyverse : consecutive appearance of zeros
One option could be:
tb1 %>%
group_by(rleid = with(rle(a), rep(seq_along(lengths), lengths))) %>%
mutate(b = 1:n() * (a != 1))
a b rleid
<dbl> <int> <int>
1 1 0 1
2 0 1 2
3 0 2 2
4 0 3 2
5 0 4 2
6 1 0 3
7 0 1 4
8 0 2 4
9 0 3 4
10 0 4 4
11 0 5 4
12 1 0 5
13 1 0 5
14 0 1 6
15 0 2 6
16 0 3 6
17 0 4 6
18 1 0 7
19 0 1 8
20 0 2 8
Related Topics
Increase the API Limit in Ggmap's Geocode Function (In R)
How to Use Plyr to Number Rows
Initialize an Empty Tibble with Column Names and 0 Rows
Fixing Set.Seed for an Entire Session
R: Text Progress Bar in for Loop
Arrange a Grouped_Df by Group Variable Not Working
How to Rank Within Groups in R
How to Change Font Size of the Correlation Coefficient in Corrplot
Sendmailr (Part2): Sending Files as Mail Attachments
Update Graph/Plot with Fixed Interval of Time
How to Write a Function That Calls a Function That Calls Data.Table
Plotting Envfit Vectors (Vegan Package) in Ggplot2
Differencebetween These Two Comparisons
Using a Loop to Create Multiple Data Frames in R
Adding Time to Posixct Object in R