Count consecutive occurences of an element in string
To put the pieces together: here's a combination of my comment on your previous question and (parts of) my answer here: Count consecutive TRUE values within each block separately. The convenience functions rleid
and rowid
from the data.table
package are used.
Toy data with two strings of different length:
s <- c("a > a > b > b > b > a > b > b", "c > c > b > b > b > c > c")
library(data.table)
lapply(strsplit(s, " > "), function(x) paste0(x, rowid(rleid(x)), collapse = " > "))
# [[1]]
# [1] "a1 > a2 > b1 > b2 > b3 > a1 > b1 > b2"
#
# [[2]]
# [1] "c1 > c2 > b1 > b2 > b3 > c1 > c2"
Create counter within consecutive runs of certain values
Here's a way, building on Joshua's rle
approach: (EDITED to use seq_len
and lapply
as per Marek's suggestion)
> (!x) * unlist(lapply(rle(x)$lengths, seq_len))
[1] 0 1 0 1 2 3 0 0 1 2
UPDATE. Just for kicks, here's another way to do it, around 5 times faster:
cumul_zeros <- function(x) {
x <- !x
rl <- rle(x)
len <- rl$lengths
v <- rl$values
cumLen <- cumsum(len)
z <- x
# replace the 0 at the end of each zero-block in z by the
# negative of the length of the preceding 1-block....
iDrops <- c(0, diff(v)) < 0
z[ cumLen[ iDrops ] ] <- -len[ c(iDrops[-1],FALSE) ]
# ... to ensure that the cumsum below does the right thing.
# We zap the cumsum with x so only the cumsums for the 1-blocks survive:
x*cumsum(z)
}
Try an example:
> cumul_zeros(c(1,1,1,0,0,0,0,0,1,1,1,0,0,1,1))
[1] 0 0 0 1 2 3 4 5 0 0 0 1 2 0 0
Now compare times on a million-length vector:
> x <- sample(0:1, 1000000,T)
> system.time( z <- cumul_zeros(x))
user system elapsed
0.15 0.00 0.14
> system.time( z <- (!x) * unlist( lapply( rle(x)$lengths, seq_len)))
user system elapsed
0.75 0.00 0.75
Moral of the story: one-liners are nicer and easier to understand, but not always the fastest!
identify blocks of consecutive True values with tolerance
I believe you can re-use solution with inverting m
by ~
and last chain both conditions by or
:
N = 3.0
M = 1
b = w.ne(w.shift()).cumsum() *w
m = b[0].map(b[0].mask(b[0] == 0).value_counts()) <= N
w1 = ~m
b1 = w1.ne(w1.shift()).cumsum() * w1
m1 = b1.map(b1.mask(b1 == 0).value_counts()) == M
m = m | m1
print (m)
0 True
1 True
2 True
3 True
4 True
5 False
6 False
7 True
8 True
9 True
10 True
11 True
12 True
Name: 0, dtype: bool
Count Number of Consecutive Occurrence of values in Google sheets
try:
=ARRAYFORMULA(QUERY(REGEXREPLACE(QUERY({TEXT(
VLOOKUP(ROW(B2:B20), QUERY((B2:B20<>B1:B19)*ROW(B2:B20),
"where Col1 <>0"), 1, 1), "000000")&"×"&B2:B20},
"select Col1,count(Col1) group by Col1 label count(Col1)''")&"", "(.+×)", ),
"where Col1 is not null"))
Count consecutive repeated values in pandas
Use cumsum()
on the comparison with shift
to identify the blocks:
# groupby exact match of values
blocks = df['col'].ne(df['col'].shift()).cumsum()
df['result'] = blocks.groupby(blocks).transform('size') >= 3
Output:
col result
2021-03-12 15:13:24.727074 2.0 False
2021-03-13 15:13:24.727074 3.0 True
2021-03-14 15:13:24.727074 3.0 True
2021-03-15 15:13:24.727074 3.0 True
2021-03-16 15:13:24.727074 2.0 False
2021-03-17 15:13:24.727074 2.0 False
2021-03-18 15:13:24.727074 3.4 False
2021-03-19 15:13:24.727074 3.1 False
2021-03-20 15:13:24.727074 2.7 False
2021-03-21 15:13:24.727074 NaN False
2021-03-22 15:13:24.727074 4.0 True
2021-03-23 15:13:24.727074 4.0 True
2021-03-24 15:13:24.727074 4.0 True
2021-03-25 15:13:24.727074 4.5 False
Note It's not ideal to use ==
to compare floats. Instead, we can use threshold, something like:
# groupby consecutive rows if the differences are not significant
blocks = df['col'].diff().abs().gt(1e-6).cumsum()
Count Number of Consecutive Occurrence of values in Table
One approach is the difference of row numbers:
select name, count(*)
from (select t.*,
(row_number() over (order by id) -
row_number() over (partition by name order by id)
) as grp
from t
) t
group by grp, name;
The logic is easiest to understand if you run the subquery and look at the values of each row number separately and then look at the difference.
R: find consecutive occurrence of a number
OP's code does work for me. So, without a specific error message it is impossible to understand why the code is not working for the OP.
However, the sample datasets created by the OP are matrices (before they were coerced to tibble
) and I felt challenged to find a way to solve the task in base R without using purrr
:
To find the number of consecutive occurences of a particular value val
in a vector x
we can use the following function:
max_rle <- function(x, val) {
y <- rle(x)
len <- y$lengths[y$value == val]
if (length(len) > 0) max(len) else NA
}
Examples:
max_rle(c(0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1), 1)
[1] 4
max_rle(c(0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1), 0)
[1] 2
# find consecutive occurrences in column batches
lapply(seq_len(ncol(dat1)), function(col_num) {
start <- head(dat1[, col_num], -1L)
end <- tail(dat1[, col_num], -1L) - 1
sapply(seq_along(start), function(range_num) {
max_rle(dat[start[range_num]:end[range_num], col_num], 1)
})
})
[[1]]
[1] 8 4 5
[[2]]
[1] 4 5 2
[[3]]
[1] NA 3 4
[[4]]
[1] 5 5 4
[[5]]
[1] 3 2 3
The first lapply()
loops over the columns of dat
and dat1
, resp. The second sapply()
loops over the row ranges stored in dat1
and subsets dat
accordingly.
Counter for consecutive negative values in a Data-frame
I am pretty unsure how to write the "Nan" (not very great myself), but here is a code that seems to do what you asked for:
df = pd.DataFrame()
df["data"] = [-1, -2, 4, 12, -22, -12, -7, -5, -22, 2, 2]
def generateOutput(df):
a = [0]
for i in range(len(df) - 1):
if df["data"][i] < 0:
a.append(a[-1] + 1)
else:
a.append(0)
df["output"] = a
return df
print(df)
df = generateOutput(df)
print(df)
And here is my output when launched the program
data
0 -1
1 -2
2 4
3 12
4 -22
5 -12
6 -7
7 -5
8 -22
9 2
10 2
data output
0 -1 0
1 -2 1
2 4 2
3 12 0
4 -22 0
5 -12 1
6 -7 2
7 -5 3
8 -22 4
9 2 5
10 2 0
Related Topics
Assign Names to Vector Entries Without Assigning the Vector a Variable Name
Remove Zombie Processes Using Parallel Package
How Does Settimelimit Work in R
R Equivalent of Stata Local or Global MACros
Ggplot Inserting Space Before Degree Symbol on Axis Label
Prevent Automatic Conversion of Single Column to Vector
Knitr: Object Cannot Be Found When Converting Markdown File into HTML
Importing S3 Method from Another Package
Why Doesn't "+" Operate on Characters in R
How to Plot a Stacked Bar with Ggplot
Get Continent Name from Country Name in R
Format Axis Tick Labels to Percentage in Plotly
Predict.Svm Does Not Predict New Data
How Fill Part of a Circle Using Ggplot2