Cumulative Minimum Value by Group

Cumulative minimum value by group

We could use cummin function by group

data$output <- with(data, ave(Target, Group, FUN = cummin))

data
# Group Target output
#1 A 1 1
#2 A 0 0
#3 A 5 0
#4 A 0 0
#5 B 3 3
#6 B 5 3
#7 B 1 1
#8 B 3 1

whose dplyr and data.table equivalents are

library(dplyr)
data %>%
group_by(Group) %>%
mutate(output = cummin(Target))

library(data.table)
setDT(data)[, output := cummin(Target), by = (Group)]

Grouping by sequence and then finding minimum value in a column

We create the grouping variable by taking the cumulative sum where sequence is 1, then filter only the groups with 9 elements, and slice the rows where the 'low' is minimum after arrangeing the 'date' in descending order to take care of cases where there are ties for the 'low'est value

df %>% 
group_by(group = cumsum(sequence == 1)) %>%
filter(n() == 9) %>%
select(date, low) %>%
arrange(desc(date)) %>%
slice(which.min(low)) %>%
ungroup %>%
select(-group)
# A tibble: 3 x 2
# date low
# <date> <dbl>
#1 2019-01-04 18
#2 2019-01-14 1
#3 2019-02-03 9

Or similar option with data.table

library(data.table)
setDT(df)[, .SD[.N == 9], .(group = cumsum(sequence == 1))
][order(-date), .SD[which.min(low)], group]

Find the lowest value and its index until certain row within group

You can use cummin to get minimum value until that row and use match to get the index.

library(dplyr)
a %>%
group_by(subject) %>%
mutate(min = cummin(value),
visit_min = match(min, unique(value)))

# subject visit value min visit_min
# <dbl> <dbl> <dbl> <dbl> <int>
# 1 1 1 100 100 1
# 2 1 2 97 97 2
# 3 1 3 120 97 2
# 4 1 4 84 84 4
# 5 1 5 60 60 5
# 6 1 6 150 60 5
# 7 1 7 150 60 5
# 8 2 1 160 160 1
# 9 2 2 100 100 2
#10 2 3 70 70 3
#11 2 4 40 40 4
#12 2 5 120 40 4

How Do I Find the Minimum Value of a Running Total by Group In SQL?

Syntactically, your first query cannot have been compilable in MS Access as the RunningTotal correlated aggregate expressoin should be included in GROUP BY clause. Practically though, correlated subqueries may not be allowed in GROUP BY and you also do not want to group by this rank calculation during aggregation. Also, in MS Access running complex operations on an underlying UNION query does yield performance issues.

Consider the following setup:

  1. Convert underlying union query to temp table:

    SELECT * INTO mytemptable FROM myUnionQuery
  2. Calculate running total (quasi aggregation).

    SELECT t.[ItemCode], t.[WarehouseCode], t.[Date], 
    (SELECT SUM(sub.[Qty]) FROM myTempTable sub
    WHERE sub.[ItemCode] = t.[ItemCode]
    AND sub.[WarehouseCode] = t.[WarehouseCode]
    AND sub.[Date] <= t.[Date]) AS RunningQtyTotal
    FROM myTempTable t;
  3. Aggregate at [ItemCode] and [WarehouseCode] levels:

    SELECT q.[ItemCode], q.[WarehouseCode], MIN(q.[RunningQtyTotal]) AS MinRunQtyTotal
    FROM mySelectQuery q
    GROUP BY q.[ItemCode], q.[WarehouseCode]

How to find the minimum value in between two values using R

You can use cummax to calculate cumulative maximum until that row and create a new group whenever the current row's value is greater than previous cummax value. Within each group you can return the minimum value.

library(dplyr)

df %>%
group_by(group = cumsum(V1 > lag(cummax(V1), default = first(V1)))) %>%
summarise(min_value = min(V1))

# group min_value
# <int> <int>
#1 0 -444
#2 1 -6
#3 2 352
#4 3 472

This considers the last part as another group hence also returns minimum value in that part. You can remove the last row if it is not needed.


To apply for multiple columns, we can write a function and call it with lapply :

apply_fun <- function(data, col) {

col1 <- sym(col)
df %>%
group_by(group = cumsum(!!col1 > lag(cummax(!!col1),
default = first(!!col1)))) %>%
summarise(min_value = min(!!col1))
}

result <- lapply(names(df), apply_fun, data = df)

data

df <- structure(list(V1 = c(0L, -80L, -160L, -160L, -160L, -160L, -160L, 
-347L, -351L, -351L, -444L, -444L, -444L, 43L, 43L, 10L, 10L,
-6L, 20L, 352L, 352L, 352L, 528L, 528L, 511L, 511L, 518L, 472L
)), class = "data.frame", row.names = c(NA, -28L))

Cumulative minimum and maximum by group

[Original answer: dplyr]

You can solve this problem by using the dplyr package:

library(dplyr)
d %>%
group_by(Key) %>%
mutate(Pmin=cummin(PriceEUR),Pmax=cummax(PriceEUR))

# Key DaysToEvent PriceEUR Pmin Pmax
# 1 AAA 120 50 50 50
# 2 AAA 110 40 40 50
# 3 AAA 100 60 40 60
# 4 BBB 100 50 50 50

where d is supposed to be your data set:

d <- data.frame(Key=c('AAA','AAA','AAA','BBB'),DaysToEvent = c(120,110,100,100),PriceEUR = c(50,40,60,50), Pmin = c(50,40,40,30), Pmax = c(50,50,60,70))

[Update: data.table]

Another approach is to use data.table, which has a quite spectacular performance:

library(data.table)
DT <- setDT(d)
DT[,c("Pmin","Pmax") := list(cummin(PriceEUR),cummax(PriceEUR)),by=Key]

DT
# Key DaysToEvent PriceEUR Pmin Pmax
# 1: AAA 120 50 50 50
# 2: AAA 110 40 40 50
# 3: AAA 100 60 40 60
# 4: BBB 100 50 50 50

[Update 2: base R]

Here is another approach in the case that you'd like to use only base R for some reason:

d$Pmin <- unlist(lapply(split(d$PriceEUR,d$Key),cummin))
d$Pmax <- unlist(lapply(split(d$PriceEUR,d$Key),cummax))

Cumulative sum in R by group and start over when sum of values in group larger than maximum value

One purrr approach could be:

cumsum(c(FALSE, diff(accumulate(test, ~ ifelse(.x >= 10, .y, .x + .y))) <= 0))

[1] 0 0 1 1 1 2 2 2 3

Group according to cumulative sums

Two possible one-liners, with purrr::accumulate and with MESS::cumsumbinning:

purrr::accumulate

library(tidyverse)
group_by(input, grp = LETTERS[cumsum(value == accumulate(value, ~ ifelse(.x + .y <= 100, .x + .y, .y)))])

MESS::cumsumbinning

library(dplyr)
group_by(input, grp = LETTERS[MESS::cumsumbinning(value, 100)])

output

# A tibble: 6 x 3
# Groups: grp [3]
id value grp
<int> <dbl> <chr>
1 1 99 A
2 2 1 A
3 3 33 B
4 4 33 B
5 5 33 B
6 6 150 C


Related Topics



Leave a reply



Submit