how to reset cumulative sum when reached to threshold
Because of the nature of this problem, you need to use a recursive CTE. This looks something like:
with t as (
select t.*, row_number() over (order by pk) as seqnum
from yourtable t
),
cte as (
select seqnum, pk. amount, amount as running_amount
from t
where seqnum = 1
union all
select t.seqnum, t.pk, t.amount,
(case when running_amount + amount > 300 then amount
else running_amount + amount
end)
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select *
from cte;
The exact syntax for recursive CTEs varies, depending on the database, but they are part of standard SQL.
Reset cumulative sum column after threshold with groups
Capping a cumulative SUM by using standard SUM() OVER() is not possible due to threshold. One way to achieve such result is recursive CTE:
WITH cte_r AS (
SELECT t.*, ROW_NUMBER() OVER(PARTITION BY GroupNr ORDER BY (SELECT 1)) AS rn
FROM Table1 t
), cte AS (
SELECT GroupNr, Name, [Sum], [CumSum],
CAST([Sum] AS INT) AS ResetCumSum,
rn
FROM cte_r
WHERE rn = 1
UNION ALL
SELECT cte_r.GroupNr, cte_r.Name, cte_r.[Sum], cte_r.[CumSum],
CAST(CASE WHEN cte.ResetCumSum >= 330 THEN 0 ELSE cte.ResetCumSum END + cte_r.[Sum] AS INT)
AS ResetCumSum,
cte_r.rn
FROM cte
JOIN cte_r
ON cte.rn = cte_r.rn-1
AND cte.GroupNr = cte_r.GroupNr
)
SELECT GroupNr, Name, [Sum], [CumSum], ResetCumSum
FROM cte
ORDER BY GroupNr, rn;
Output:
db<>fiddle demo
Warning: Table by design is unordered set so to get stable result a order column is required(like unqiue id, timestamp). Here to emulate insert ROW_NUMBER() OVER(PARTITION BY GroupNr ORDER BY (SELECT 1)) AS rn
was used but it is not stable.
Related:
Conditional SUM and the same using MATCH_RECOGNIZE - in my opinion the cleanest way
Extra:
Quirky UPDATE: Running Total until specific condition is true
Disclaimer: "DO NOT USE IT AT PRODUCTION!!!"
-- source table to be extended with id and Resetcumsum columns
CREATE CLUSTERED INDEX IX_ROW_NUM ON Table1(GroupNr, id);
DECLARE @running_total NUMERIC(14,2) = 0
,@prev_running_total NUMERIC(14,2) = 0
,@prev_GroupNr INT = 0;
UPDATE Table1
SET
@prev_running_total = @running_total
,@running_total = Resetcumsum = IIF(@prev_GroupNr != GroupNr
OR @running_total >= 330, 0, @running_total)
+ [Sum]
,@prev_GroupNr = GroupNr
FROM Table1 WITH(INDEX(IX_ROW_NUM))
OPTION (MAXDOP 1);
SELECT *
FROM Table1
ORDER BY id;
db<>fiddle demo - 2
Reset rolling sum to 0 after reaching the threshold
Here is the way I managed to do it:
SELECT *,
SUM(case when month_disc=1 OR month_ticket=0 then 0 else value end) OVER (PARTITION BY account, flg_sum, band_sum ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_sum
FROM (
SELECT *,
FLOOR(SUM(case when month_disc=1 OR month_ticket=0 then 0 else value end) OVER (PARTITION BY account, flg_sum ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)/50.000001) as band_sum ---- create bands for running total
FROM (
SELECT *,
SUM(tag_flg) OVER (PARTITION BY account ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flg_sum
FROM (
SELECT *,
CASE WHEN (month_disc=1 OR month_ticket=0) THEN 1 ELSE 0 END AS tag_flg ---- flag to count when the value is reset due to one of the conditions
FROM source_table) x ) y) z
sum() until threshold value reached and summarize it as a single record and reset and continue the aggregation
Consider below approach
with recursive temp as (
select *, row_number() over(partition by id order by from_range) pos
from your_table
), result as (
select *, total_amount as total, true as new_group
from temp where pos = 1
union all
select t.*,
if(total + t.total_amount > 10000000, t.total_amount, total + t.total_amount),
if(total + t.total_amount > 10000000, true, false)
from temp t join result r
on t.pos = r.pos + 1 and t.id = r.id
)
select id,
min(from_range) from_range,
max(to_range) to_range,
max(total) as total_amount
from (
select *, countif(new_group) over(partition by id order by pos) grp
from result
)
group by id, grp
if applied to sample data in your question - output is
Resetting Cumulative Sum once a value is reached and set a flag to 1
"Ordinary" cumsum() is here useless, as this function "doesn't know"
where to restart summation.
You can do it with the following custom function:
def myCumSum(x, thr):
if myCumSum.prev >= thr:
myCumSum.prev = 0
myCumSum.prev += x
return myCumSum.prev
This function is "with memory" (from the previous call) - prev, so there
is a way to "know" where to restart.
To speed up the execution, define a vectorized version of this function:
myCumSumV = np.vectorize(myCumSum, otypes=[np.int], excluded=['thr'])
Then execute:
threshold = 40
myCumSum.prev = 0 # Set the "previous" value
# Replace "a" column with your cumulative sum
df.a = myCumSumV(df.a.values, threshold)
df['flag'] = df.a.ge(threshold).astype(int) # Compute "flag" column
The result is:
a b flag
0 5 1 0
1 11 1 0
2 41 1 1
3 170 0 1
4 5 1 0
5 15 1 0
Related Topics
Can Linq to SQL Query an Xml Field Db-Serverside
Create View Must Be the Only Statement in the Batch
Using Object_Id() Function with #Tables
Using Subquery in a Check Statement in Oracle
Exists/Not Exists: 'Select 1' VS 'Select Field'
Login Failed. the Login Is from an Untrusted Domain and Cannot Be Used with Windows Authentication
Incorrect Parameter Count in the Call to Native Function 'Datediff'
Update a Single Row with T-Sql
Postgres Query of an Array Using Like
Why Doesn't Oracle Raise "Ora-00918: Column Ambiguously Defined" for This Query
SQL - Add Up All Row-Values of One Column in a Singletable
The Object Name Contains More Than the Maximum Number of Prefixes. the Maximum Is 3