Count Consecutive True Values Within Each Block Separately

Count consecutive occurences of an element in string

To put the pieces together: here's a combination of my comment on your previous question and (parts of) my answer here: Count consecutive TRUE values within each block separately. The convenience functions rleid and rowid from the data.table package are used.

Toy data with two strings of different length:

s <- c("a > a > b > b > b > a > b > b", "c > c > b > b > b > c > c")

library(data.table)
lapply(strsplit(s, " > "), function(x) paste0(x, rowid(rleid(x)), collapse = " > "))
# [[1]]
# [1] "a1 > a2 > b1 > b2 > b3 > a1 > b1 > b2"
#
# [[2]]
# [1] "c1 > c2 > b1 > b2 > b3 > c1 > c2"

Create counter within consecutive runs of certain values

Here's a way, building on Joshua's rle approach: (EDITED to use seq_len and lapply as per Marek's suggestion)

> (!x) * unlist(lapply(rle(x)$lengths, seq_len))
[1] 0 1 0 1 2 3 0 0 1 2

UPDATE. Just for kicks, here's another way to do it, around 5 times faster:

cumul_zeros <- function(x)  {
x <- !x
rl <- rle(x)
len <- rl$lengths
v <- rl$values
cumLen <- cumsum(len)
z <- x
# replace the 0 at the end of each zero-block in z by the
# negative of the length of the preceding 1-block....
iDrops <- c(0, diff(v)) < 0
z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
# ... to ensure that the cumsum below does the right thing.
# We zap the cumsum with x so only the cumsums for the 1-blocks survive:
x*cumsum(z)
}

Try an example:

> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))
[1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0

Now compare times on a million-length vector:

> x <- sample(0:1, 1000000,T)
> system.time( z <- cumul_zeros(x))
user system elapsed
0.15 0.00 0.14
> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))
user system elapsed
0.75 0.00 0.75

Moral of the story: one-liners are nicer and easier to understand, but not always the fastest!

identify blocks of consecutive True values with tolerance

I believe you can re-use solution with inverting m by ~ and last chain both conditions by or :

N = 3.0
M = 1
b = w.ne(w.shift()).cumsum() *w
m = b[0].map(b[0].mask(b[0] == 0).value_counts()) <= N

w1 = ~m
b1 = w1.ne(w1.shift()).cumsum() * w1
m1 = b1.map(b1.mask(b1 == 0).value_counts()) == M

m = m | m1

print (m)
0 True
1 True
2 True
3 True
4 True
5 False
6 False
7 True
8 True
9 True
10 True
11 True
12 True
Name: 0, dtype: bool

Count Number of Consecutive Occurrence of values in Google sheets

try:

=ARRAYFORMULA(QUERY(REGEXREPLACE(QUERY({TEXT(
VLOOKUP(ROW(B2:B20), QUERY((B2:B20<>B1:B19)*ROW(B2:B20),
"where Col1 <>0"), 1, 1), "000000")&"×"&B2:B20},
"select Col1,count(Col1) group by Col1 label count(Col1)''")&"", "(.+×)", ),
"where Col1 is not null"))

Sample Image

Count consecutive repeated values in pandas

Use cumsum() on the comparison with shift to identify the blocks:

# groupby exact match of values
blocks = df['col'].ne(df['col'].shift()).cumsum()

df['result'] = blocks.groupby(blocks).transform('size') >= 3

Output:

                            col  result
2021-03-12 15:13:24.727074 2.0 False
2021-03-13 15:13:24.727074 3.0 True
2021-03-14 15:13:24.727074 3.0 True
2021-03-15 15:13:24.727074 3.0 True
2021-03-16 15:13:24.727074 2.0 False
2021-03-17 15:13:24.727074 2.0 False
2021-03-18 15:13:24.727074 3.4 False
2021-03-19 15:13:24.727074 3.1 False
2021-03-20 15:13:24.727074 2.7 False
2021-03-21 15:13:24.727074 NaN False
2021-03-22 15:13:24.727074 4.0 True
2021-03-23 15:13:24.727074 4.0 True
2021-03-24 15:13:24.727074 4.0 True
2021-03-25 15:13:24.727074 4.5 False

Note It's not ideal to use == to compare floats. Instead, we can use threshold, something like:

# groupby consecutive rows if the differences are not significant
blocks = df['col'].diff().abs().gt(1e-6).cumsum()

Count Number of Consecutive Occurrence of values in Table

One approach is the difference of row numbers:

select name, count(*) 
from (select t.*,
(row_number() over (order by id) -
row_number() over (partition by name order by id)
) as grp
from t
) t
group by grp, name;

The logic is easiest to understand if you run the subquery and look at the values of each row number separately and then look at the difference.

R: find consecutive occurrence of a number

OP's code does work for me. So, without a specific error message it is impossible to understand why the code is not working for the OP.

However, the sample datasets created by the OP are matrices (before they were coerced to tibble) and I felt challenged to find a way to solve the task in base R without using purrr:

To find the number of consecutive occurences of a particular value val in a vector x we can use the following function:

max_rle <- function(x, val) {
y <- rle(x)
len <- y$lengths[y$value == val]
if (length(len) > 0) max(len) else NA
}

Examples:

max_rle(c(0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1), 1)
[1] 4
max_rle(c(0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1), 0)
[1] 2
# find consecutive occurrences in column batches
lapply(seq_len(ncol(dat1)), function(col_num) {
start <- head(dat1[, col_num], -1L)
end <- tail(dat1[, col_num], -1L) - 1
sapply(seq_along(start), function(range_num) {
max_rle(dat[start[range_num]:end[range_num], col_num], 1)
})
})
[[1]]
[1] 8 4 5

[[2]]
[1] 4 5 2

[[3]]
[1] NA 3 4

[[4]]
[1] 5 5 4

[[5]]
[1] 3 2 3

The first lapply() loops over the columns of dat and dat1, resp. The second sapply() loops over the row ranges stored in dat1 and subsets dat accordingly.

Counter for consecutive negative values in a Data-frame

I am pretty unsure how to write the "Nan" (not very great myself), but here is a code that seems to do what you asked for:

df = pd.DataFrame()
df["data"] = [-1, -2, 4, 12, -22, -12, -7, -5, -22, 2, 2]
def generateOutput(df):
a = [0]
for i in range(len(df) - 1):
if df["data"][i] < 0:
a.append(a[-1] + 1)
else:
a.append(0)
df["output"] = a
return df

print(df)
df = generateOutput(df)
print(df)

And here is my output when launched the program

    data
0 -1
1 -2
2 4
3 12
4 -22
5 -12
6 -7
7 -5
8 -22
9 2
10 2
    data  output
0 -1 0
1 -2 1
2 4 2
3 12 0
4 -22 0
5 -12 1
6 -7 2
7 -5 3
8 -22 4
9 2 5
10 2 0


Related Topics



Leave a reply



Submit