Cumulative minimum value by group
We could use cummin
function by group
data$output <- with(data, ave(Target, Group, FUN = cummin))
data
# Group Target output
#1 A 1 1
#2 A 0 0
#3 A 5 0
#4 A 0 0
#5 B 3 3
#6 B 5 3
#7 B 1 1
#8 B 3 1
whose dplyr
and data.table
equivalents are
library(dplyr)
data %>%
group_by(Group) %>%
mutate(output = cummin(Target))
library(data.table)
setDT(data)[, output := cummin(Target), by = (Group)]
Grouping by sequence and then finding minimum value in a column
We create the grouping variable by taking the cumulative sum where sequence is 1, then filter
only the groups with 9 elements, and slice
the rows where the 'low' is minimum after arrange
ing the 'date' in desc
ending order to take care of cases where there are ties for the 'low'est value
df %>%
group_by(group = cumsum(sequence == 1)) %>%
filter(n() == 9) %>%
select(date, low) %>%
arrange(desc(date)) %>%
slice(which.min(low)) %>%
ungroup %>%
select(-group)
# A tibble: 3 x 2
# date low
# <date> <dbl>
#1 2019-01-04 18
#2 2019-01-14 1
#3 2019-02-03 9
Or similar option with data.table
library(data.table)
setDT(df)[, .SD[.N == 9], .(group = cumsum(sequence == 1))
][order(-date), .SD[which.min(low)], group]
Find the lowest value and its index until certain row within group
You can use cummin
to get minimum value until that row and use match
to get the index.
library(dplyr)
a %>%
group_by(subject) %>%
mutate(min = cummin(value),
visit_min = match(min, unique(value)))
# subject visit value min visit_min
# <dbl> <dbl> <dbl> <dbl> <int>
# 1 1 1 100 100 1
# 2 1 2 97 97 2
# 3 1 3 120 97 2
# 4 1 4 84 84 4
# 5 1 5 60 60 5
# 6 1 6 150 60 5
# 7 1 7 150 60 5
# 8 2 1 160 160 1
# 9 2 2 100 100 2
#10 2 3 70 70 3
#11 2 4 40 40 4
#12 2 5 120 40 4
How Do I Find the Minimum Value of a Running Total by Group In SQL?
Syntactically, your first query cannot have been compilable in MS Access as the RunningTotal correlated aggregate expressoin should be included in GROUP BY
clause. Practically though, correlated subqueries may not be allowed in GROUP BY
and you also do not want to group by this rank calculation during aggregation. Also, in MS Access running complex operations on an underlying UNION
query does yield performance issues.
Consider the following setup:
Convert underlying union query to temp table:
SELECT * INTO mytemptable FROM myUnionQuery
Calculate running total (quasi aggregation).
SELECT t.[ItemCode], t.[WarehouseCode], t.[Date],
(SELECT SUM(sub.[Qty]) FROM myTempTable sub
WHERE sub.[ItemCode] = t.[ItemCode]
AND sub.[WarehouseCode] = t.[WarehouseCode]
AND sub.[Date] <= t.[Date]) AS RunningQtyTotal
FROM myTempTable t;Aggregate at
[ItemCode]
and[WarehouseCode]
levels:SELECT q.[ItemCode], q.[WarehouseCode], MIN(q.[RunningQtyTotal]) AS MinRunQtyTotal
FROM mySelectQuery q
GROUP BY q.[ItemCode], q.[WarehouseCode]
How to find the minimum value in between two values using R
You can use cummax
to calculate cumulative maximum until that row and create a new group whenever the current row's value is greater than previous cummax
value. Within each group you can return the minimum value.
library(dplyr)
df %>%
group_by(group = cumsum(V1 > lag(cummax(V1), default = first(V1)))) %>%
summarise(min_value = min(V1))
# group min_value
# <int> <int>
#1 0 -444
#2 1 -6
#3 2 352
#4 3 472
This considers the last part as another group hence also returns minimum value in that part. You can remove the last row if it is not needed.
To apply for multiple columns, we can write a function and call it with lapply
:
apply_fun <- function(data, col) {
col1 <- sym(col)
df %>%
group_by(group = cumsum(!!col1 > lag(cummax(!!col1),
default = first(!!col1)))) %>%
summarise(min_value = min(!!col1))
}
result <- lapply(names(df), apply_fun, data = df)
data
df <- structure(list(V1 = c(0L, -80L, -160L, -160L, -160L, -160L, -160L,
-347L, -351L, -351L, -444L, -444L, -444L, 43L, 43L, 10L, 10L,
-6L, 20L, 352L, 352L, 352L, 528L, 528L, 511L, 511L, 518L, 472L
)), class = "data.frame", row.names = c(NA, -28L))
Cumulative minimum and maximum by group
[Original answer: dplyr]
You can solve this problem by using the dplyr
package:
library(dplyr)
d %>%
group_by(Key) %>%
mutate(Pmin=cummin(PriceEUR),Pmax=cummax(PriceEUR))
# Key DaysToEvent PriceEUR Pmin Pmax
# 1 AAA 120 50 50 50
# 2 AAA 110 40 40 50
# 3 AAA 100 60 40 60
# 4 BBB 100 50 50 50
where d
is supposed to be your data set:
d <- data.frame(Key=c('AAA','AAA','AAA','BBB'),DaysToEvent = c(120,110,100,100),PriceEUR = c(50,40,60,50), Pmin = c(50,40,40,30), Pmax = c(50,50,60,70))
[Update: data.table]
Another approach is to use data.table
, which has a quite spectacular performance:
library(data.table)
DT <- setDT(d)
DT[,c("Pmin","Pmax") := list(cummin(PriceEUR),cummax(PriceEUR)),by=Key]
DT
# Key DaysToEvent PriceEUR Pmin Pmax
# 1: AAA 120 50 50 50
# 2: AAA 110 40 40 50
# 3: AAA 100 60 40 60
# 4: BBB 100 50 50 50
[Update 2: base R]
Here is another approach in the case that you'd like to use only base R for some reason:
d$Pmin <- unlist(lapply(split(d$PriceEUR,d$Key),cummin))
d$Pmax <- unlist(lapply(split(d$PriceEUR,d$Key),cummax))
Cumulative sum in R by group and start over when sum of values in group larger than maximum value
One purrr
approach could be:
cumsum(c(FALSE, diff(accumulate(test, ~ ifelse(.x >= 10, .y, .x + .y))) <= 0))
[1] 0 0 1 1 1 2 2 2 3
Group according to cumulative sums
Two possible one-liners, with purrr::accumulate
and with MESS::cumsumbinning
:
purrr::accumulate
library(tidyverse)
group_by(input, grp = LETTERS[cumsum(value == accumulate(value, ~ ifelse(.x + .y <= 100, .x + .y, .y)))])
MESS::cumsumbinning
library(dplyr)
group_by(input, grp = LETTERS[MESS::cumsumbinning(value, 100)])
output
# A tibble: 6 x 3
# Groups: grp [3]
id value grp
<int> <dbl> <chr>
1 1 99 A
2 2 1 A
3 3 33 B
4 4 33 B
5 5 33 B
6 6 150 C
Related Topics
Stats on Every N Rows for Each Column
Sequentially Rename 100+ Columns Having Idiosyncratic Names
Selecting Multiple Parts of a List
Collapse/Concatenate/Aggregate Multiple Columns to a Single Comma Separated String Within Each Group
R: Miscellaneous Errors While Trying to Plot Graphs
Distance Calculation on Large Vectors [Performance]
R - Help in Converting Factor to Date (%M/%D/%Y %H:%M)
Predict.Lm in R Fails to Recognize Newdata
Group Values by Unique Elements
Selecting Multiple Columns in Data Frame Using Partial Column Name
Dplyr Write a Function with Column Names as Inputs
R Split a Column into Multiple Column by Pattern
How to Order a Nominale Variable. E.G Month in R
Filling in the Area Under a Line Graph in Ggplot2: Geom_Area()
How Can One Mix 2 or More Color Palettes to Show a Combined Color Value