How to Sum() Over Column with Reset Condition

Reset cumulative sum column after threshold with groups

Capping a cumulative SUM by using standard SUM() OVER() is not possible due to threshold. One way to achieve such result is recursive CTE:

WITH cte_r AS (
SELECT t.*, ROW_NUMBER() OVER(PARTITION BY GroupNr ORDER BY (SELECT 1)) AS rn
FROM Table1 t
), cte AS (
SELECT GroupNr, Name, [Sum], [CumSum],
CAST([Sum] AS INT) AS ResetCumSum,
rn
FROM cte_r
WHERE rn = 1
UNION ALL
SELECT cte_r.GroupNr, cte_r.Name, cte_r.[Sum], cte_r.[CumSum],
CAST(CASE WHEN cte.ResetCumSum >= 330 THEN 0 ELSE cte.ResetCumSum END + cte_r.[Sum] AS INT)
AS ResetCumSum,
cte_r.rn
FROM cte
JOIN cte_r
ON cte.rn = cte_r.rn-1
AND cte.GroupNr = cte_r.GroupNr
)
SELECT GroupNr, Name, [Sum], [CumSum], ResetCumSum
FROM cte
ORDER BY GroupNr, rn;

Output:

Sample Image

db<>fiddle demo

Warning: Table by design is unordered set so to get stable result a order column is required(like unqiue id, timestamp). Here to emulate insert ROW_NUMBER() OVER(PARTITION BY GroupNr ORDER BY (SELECT 1)) AS rn was used but it is not stable.

Related:

Conditional SUM and the same using MATCH_RECOGNIZE - in my opinion the cleanest way



Extra:

Quirky UPDATE: Running Total until specific condition is true

Disclaimer: "DO NOT USE IT AT PRODUCTION!!!"

-- source table to be extended with id and Resetcumsum  columns
CREATE CLUSTERED INDEX IX_ROW_NUM ON Table1(GroupNr, id);

DECLARE @running_total NUMERIC(14,2) = 0
,@prev_running_total NUMERIC(14,2) = 0
,@prev_GroupNr INT = 0;

UPDATE Table1
SET
@prev_running_total = @running_total
,@running_total = Resetcumsum = IIF(@prev_GroupNr != GroupNr
OR @running_total >= 330, 0, @running_total)
+ [Sum]
,@prev_GroupNr = GroupNr
FROM Table1 WITH(INDEX(IX_ROW_NUM))
OPTION (MAXDOP 1);

SELECT *
FROM Table1
ORDER BY id;

db<>fiddle demo - 2

How do I reset a sum() over () in a SQL Server query?

You could first do a conditional window sum to define the groups: everytime a reset is found, a new group starts. Then you can simply do a window sum of numbers within the groups.

select 
id,
date,
info,
number,
sum(number) over(partition by id, grp order by date) summation
from (
select
t.*,
sum(case when info = 'reset' then 1 else 0 end)
over(partition by id order by date) grp
from mytable t
) t

reset a running total in same partition based on a condition

Do a running total on ResetSum in a derived table and use that as a partition column in the running total on Amount.

select T.PersonID,
T.Amount,
T.PayDate,
sum(T.Amount) over(partition by T.PersonID, T.ResetSum
order by T.PayDate rows unbounded preceding) as SumAmount
from (
select T1.PersonID,
T1.Amount,
T1.PayDate,
sum(case T1.ResetSum
when 1 then 1
else 0
end) over(partition by T1.PersonID
order by T1.PayDate rows unbounded preceding) as ResetSum
from dbo.Table_1 as T1
) as T;

SQL Fiddle

Pyspark - Cumulative sum with reset condition

Create a temporary column (grp) that increments a counter each time column C is equal to 0 (the reset condition) and use this as a partitioning column for your cumulative sum.

import pyspark.sql.functions as f
from pyspark.sql import Window

x.withColumn(
"grp",
f.sum((f.col("C") == 0).cast("int")).over(Window.orderBy("A"))
).withColumn(
"D",
f.sum(f.col("C")).over(Window.partitionBy("grp").orderBy("A"))
).drop("grp").show()
#+---+----+---+---+
#| A| B| C| D|
#+---+----+---+---+
#| 0|null| 1| 1|
#| 1| 3.0| 0| 0|
#| 2| 7.0| 0| 0|
#| 3|null| 1| 1|
#| 4| 4.0| 0| 0|
#| 5| 3.0| 0| 0|
#| 6|null| 1| 1|
#| 7|null| 1| 2|
#| 8|null| 1| 3|
#| 9| 5.0| 0| 0|
#| 10| 2.0| 0| 0|
#| 11|null| 1| 1|
#+---+----+---+---+

cumsum with a condition to restart in R

You may use cumsum to create groups as well.

library(dplyr)

df <- df %>%
group_by(group = cumsum(dplyr::lag(port == 0, default = 0))) %>%
mutate(cumsum_G = cumsum(G)) %>%
ungroup

df

# inv ass port G group cumsum_G
# <chr> <chr> <int> <int> <dbl> <int>
#1 i x 2 1 0 1
#2 i x 2 0 0 1
#3 i x 0 1 0 2
#4 i x 3 0 1 0
#5 i x 3 1 1 1

You may remove the group column from output using %>% select(-group).

data

df <- structure(list(inv = c("i", "i", "i", "i", "i"), ass = c("x", 
"x", "x", "x", "x"), port = c(2L, 2L, 0L, 3L, 3L), G = c(1L,
0L, 1L, 0L, 1L)), class = "data.frame", row.names = c(NA, -5L))

SQL to calculate cumulative sum that resets based on previous value in a column in Hive

My amendment to @GordonLinoff's answer as the OP didn't quite understand what I meant.

SELECT
t.KEY1, t.Date_, t.VAL1,
ROW_NUMBER() OVER (PARTITION BY key1, grp
ORDER BY Date_
)
- 1
AS CUMU_VAL2
FROM
(
SELECT
*,
SUM(
CASE WHEN val1 = 0 THEN 1 ELSE 0 END
)
OVER (
PARTITION BY key1
ORDER BY date_
)
AS grp
FROM
source_table
)
t;


Related Topics



Leave a reply



Submit