How to Sum() Over Column with Reset Condition

Reset cumulative sum column after threshold with groups

Capping a cumulative SUM by using standard SUM() OVER() is not possible due to threshold. One way to achieve such result is recursive CTE:

WITH cte_r AS (
  SELECT t.*, ROW_NUMBER() OVER(PARTITION BY GroupNr ORDER BY (SELECT 1)) AS rn 
  FROM Table1 t
), cte AS (
  SELECT GroupNr, Name, [Sum], [CumSum],
        CAST([Sum] AS INT) AS ResetCumSum,
        rn
  FROM cte_r
  WHERE rn = 1
  UNION ALL
  SELECT cte_r.GroupNr, cte_r.Name, cte_r.[Sum], cte_r.[CumSum],
  CAST(CASE WHEN cte.ResetCumSum >= 330 THEN 0 ELSE cte.ResetCumSum END + cte_r.[Sum] AS INT) 
       AS ResetCumSum,
       cte_r.rn
  FROM cte
  JOIN cte_r
    ON cte.rn = cte_r.rn-1
   AND cte.GroupNr = cte_r.GroupNr
)
SELECT GroupNr, Name, [Sum], [CumSum], ResetCumSum
FROM cte
ORDER BY GroupNr, rn;

Output:

Sample Image

db<>fiddle demo

Warning: Table by design is unordered set so to get stable result a order column is required(like unqiue id, timestamp). Here to emulate insert ROW_NUMBER() OVER(PARTITION BY GroupNr ORDER BY (SELECT 1)) AS rn was used but it is not stable.

Conditional SUM and the same using MATCH_RECOGNIZE - in my opinion the cleanest way

Extra:

Quirky UPDATE: Running Total until specific condition is true

Disclaimer: "DO NOT USE IT AT PRODUCTION!!!"

-- source table to be extended with id and Resetcumsum  columns
CREATE CLUSTERED INDEX IX_ROW_NUM  ON Table1(GroupNr, id);

DECLARE @running_total       NUMERIC(14,2) = 0
        ,@prev_running_total NUMERIC(14,2) = 0
        ,@prev_GroupNr       INT = 0;

UPDATE Table1
SET 
   @prev_running_total = @running_total
  ,@running_total = Resetcumsum = IIF(@prev_GroupNr != GroupNr 
                                      OR @running_total >= 330, 0, @running_total) 
                                    + [Sum]
  ,@prev_GroupNr = GroupNr
FROM Table1 WITH(INDEX(IX_ROW_NUM))
OPTION (MAXDOP 1);

SELECT *
FROM Table1
ORDER BY id;

db<>fiddle demo - 2

How do I reset a sum() over () in a SQL Server query?

You could first do a conditional window sum to define the groups: everytime a reset is found, a new group starts. Then you can simply do a window sum of numbers within the groups.

select 
    id,
    date,
    info,
    number,
    sum(number) over(partition by id, grp order by date) summation
from (
    select
        t.*,
        sum(case when info = 'reset' then 1 else 0 end) 
            over(partition by id order by date) grp
    from mytable t
) t

reset a running total in same partition based on a condition

Do a running total on ResetSum in a derived table and use that as a partition column in the running total on Amount.

select T.PersonID,
       T.Amount,
       T.PayDate,
       sum(T.Amount) over(partition by T.PersonID, T.ResetSum 
                          order by T.PayDate rows unbounded preceding) as SumAmount
from (
     select T1.PersonID,
            T1.Amount,
            T1.PayDate,
            sum(case T1.ResetSum 
                  when 1 then 1 
                  else 0 
                end) over(partition by T1.PersonID 
                          order by T1.PayDate rows unbounded preceding) as ResetSum
     from dbo.Table_1 as T1
     ) as T;

SQL Fiddle

Pyspark - Cumulative sum with reset condition

Create a temporary column (grp) that increments a counter each time column C is equal to 0 (the reset condition) and use this as a partitioning column for your cumulative sum.

import pyspark.sql.functions as f
from pyspark.sql import Window

x.withColumn(
    "grp", 
    f.sum((f.col("C") == 0).cast("int")).over(Window.orderBy("A"))
).withColumn(
    "D",
    f.sum(f.col("C")).over(Window.partitionBy("grp").orderBy("A"))
).drop("grp").show()
#+---+----+---+---+
#|  A|   B|  C|  D|
#+---+----+---+---+
#|  0|null|  1|  1|
#|  1| 3.0|  0|  0|
#|  2| 7.0|  0|  0|
#|  3|null|  1|  1|
#|  4| 4.0|  0|  0|
#|  5| 3.0|  0|  0|
#|  6|null|  1|  1|
#|  7|null|  1|  2|
#|  8|null|  1|  3|
#|  9| 5.0|  0|  0|
#| 10| 2.0|  0|  0|
#| 11|null|  1|  1|
#+---+----+---+---+

cumsum with a condition to restart in R

You may use cumsum to create groups as well.

library(dplyr)

df <- df %>%
  group_by(group = cumsum(dplyr::lag(port == 0, default = 0))) %>%
  mutate(cumsum_G = cumsum(G)) %>%
  ungroup

df

#  inv   ass    port     G group cumsum_G
#  <chr> <chr> <int> <int> <dbl>    <int>
#1 i     x         2     1     0        1
#2 i     x         2     0     0        1
#3 i     x         0     1     0        2
#4 i     x         3     0     1        0
#5 i     x         3     1     1        1

You may remove the group column from output using %>% select(-group).

data

df <- structure(list(inv = c("i", "i", "i", "i", "i"), ass = c("x", 
"x", "x", "x", "x"), port = c(2L, 2L, 0L, 3L, 3L), G = c(1L, 
0L, 1L, 0L, 1L)), class = "data.frame", row.names = c(NA, -5L))

SQL to calculate cumulative sum that resets based on previous value in a column in Hive

My amendment to @GordonLinoff's answer as the OP didn't quite understand what I meant.

SELECT
  t.KEY1, t.Date_, t.VAL1,
  ROW_NUMBER() OVER (PARTITION BY key1, grp
                         ORDER BY Date_
                    )
                    - 1
                         AS CUMU_VAL2
FROM
(
  SELECT
    *,
    SUM(
      CASE WHEN val1 = 0 THEN 1 ELSE 0 END
    )
    OVER (
      PARTITION BY key1
          ORDER BY date_
    )
      AS grp
  FROM
    source_table
)
  t;