How to Calculate a Running Total in SQL Without Using a Cursor

How do I calculate a running total in SQL without using a cursor?

You might want to take a look at the update to local variable solution here: http://geekswithblogs.net/Rhames/archive/2008/10/28/calculating-running-totals-in-sql-server-2005---the-optimal.aspx

DECLARE @SalesTbl TABLE (DayCount smallint, Sales money, RunningTotal money)

DECLARE @RunningTotal money

SET @RunningTotal = 0

INSERT INTO @SalesTbl
SELECT DayCount, Sales, null
FROM Sales
ORDER BY DayCount

UPDATE @SalesTbl
SET @RunningTotal = RunningTotal = @RunningTotal + Sales
FROM @SalesTbl

SELECT * FROM @SalesTbl

Outperforms all other methods, but has some doubts about guaranteed row order. Seems to work fine when temp table is indexed though..

  • Nested sub-query 9300 ms
  • Self join 6100 ms
  • Cursor 400 ms
  • Update to local variable 140 ms

How could I re-write this cursor to calculate running total?

An example with rows unbounded preceding:

SELECT TOP 1000 [Dato]
,[Department]
,[Amount]
,runningtotal = SUM(amount) over(order by dato ROWS UNBOUNDED PRECEDING)
FROM [LegOgSpass].[dbo].[amounts]

Result

Sample Image

Calculate running total / running balance

For those not using SQL Server 2012 or above, a cursor is likely the most efficient supported and guaranteed method outside of CLR. There are other approaches such as the "quirky update" which can be marginally faster but not guaranteed to work in the future, and of course set-based approaches with hyperbolic performance profiles as the table gets larger, and recursive CTE methods that often require direct #tempdb I/O or result in spills that yield roughly the same impact.



INNER JOIN - do not do this:

The slow, set-based approach is of the form:

SELECT t1.TID, t1.amt, RunningTotal = SUM(t2.amt)
FROM dbo.Transactions AS t1
INNER JOIN dbo.Transactions AS t2
ON t1.TID >= t2.TID
GROUP BY t1.TID, t1.amt
ORDER BY t1.TID;

The reason this is slow? As the table gets larger, each incremental row requires reading n-1 rows in the table. This is exponential and bound for failures, timeouts, or just angry users.



Correlated subquery - do not do this either:

The subquery form is similarly painful for similarly painful reasons.

SELECT TID, amt, RunningTotal = amt + COALESCE(
(
SELECT SUM(amt)
FROM dbo.Transactions AS i
WHERE i.TID < o.TID), 0
)
FROM dbo.Transactions AS o
ORDER BY TID;


Quirky update - do this at your own risk:

The "quirky update" method is more efficient than the above, but the behavior is not documented, there are no guarantees about order, and the behavior might work today but could break in the future. I'm including this because it is a popular method and it is efficient, but that doesn't mean I endorse it. The primary reason I even answered this question instead of closing it as a duplicate is because the other question has a quirky update as the accepted answer.

DECLARE @t TABLE
(
TID INT PRIMARY KEY,
amt INT,
RunningTotal INT
);

DECLARE @RunningTotal INT = 0;

INSERT @t(TID, amt, RunningTotal)
SELECT TID, amt, RunningTotal = 0
FROM dbo.Transactions
ORDER BY TID;

UPDATE @t
SET @RunningTotal = RunningTotal = @RunningTotal + amt
FROM @t;

SELECT TID, amt, RunningTotal
FROM @t
ORDER BY TID;


Recursive CTEs

This first one relies on TID to be contiguous, no gaps:

;WITH x AS
(
SELECT TID, amt, RunningTotal = amt
FROM dbo.Transactions
WHERE TID = 1
UNION ALL
SELECT y.TID, y.amt, x.RunningTotal + y.amt
FROM x
INNER JOIN dbo.Transactions AS y
ON y.TID = x.TID + 1
)
SELECT TID, amt, RunningTotal
FROM x
ORDER BY TID
OPTION (MAXRECURSION 10000);

If you can't rely on this, then you can use this variation, which simply builds a contiguous sequence using ROW_NUMBER():

;WITH y AS 
(
SELECT TID, amt, rn = ROW_NUMBER() OVER (ORDER BY TID)
FROM dbo.Transactions
), x AS
(
SELECT TID, rn, amt, rt = amt
FROM y
WHERE rn = 1
UNION ALL
SELECT y.TID, y.rn, y.amt, x.rt + y.amt
FROM x INNER JOIN y
ON y.rn = x.rn + 1
)
SELECT TID, amt, RunningTotal = rt
FROM x
ORDER BY x.rn
OPTION (MAXRECURSION 10000);

Depending on the size of the data (e.g. columns we don't know about), you may find better overall performance by stuffing the relevant columns only in a #temp table first, and processing against that instead of the base table:

CREATE TABLE #x
(
rn INT PRIMARY KEY,
TID INT,
amt INT
);

INSERT INTO #x (rn, TID, amt)
SELECT ROW_NUMBER() OVER (ORDER BY TID),
TID, amt
FROM dbo.Transactions;

;WITH x AS
(
SELECT TID, rn, amt, rt = amt
FROM #x
WHERE rn = 1
UNION ALL
SELECT y.TID, y.rn, y.amt, x.rt + y.amt
FROM x INNER JOIN #x AS y
ON y.rn = x.rn + 1
)
SELECT TID, amt, RunningTotal = rt
FROM x
ORDER BY TID
OPTION (MAXRECURSION 10000);

DROP TABLE #x;

Only the first CTE method will provide performance rivaling the quirky update, but it makes a big assumption about the nature of the data (no gaps). The other two methods will fall back and in those cases you may as well use a cursor (if you can't use CLR and you're not yet on SQL Server 2012 or above).



Cursor

Everybody is told that cursors are evil, and that they should be avoided at all costs, but this actually beats the performance of most other supported methods, and is safer than the quirky update. The only ones I prefer over the cursor solution are the 2012 and CLR methods (below):

CREATE TABLE #x
(
TID INT PRIMARY KEY,
amt INT,
rt INT
);

INSERT #x(TID, amt)
SELECT TID, amt
FROM dbo.Transactions
ORDER BY TID;

DECLARE @rt INT, @tid INT, @amt INT;
SET @rt = 0;

DECLARE c CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR SELECT TID, amt FROM #x ORDER BY TID;

OPEN c;

FETCH c INTO @tid, @amt;

WHILE @@FETCH_STATUS = 0
BEGIN
SET @rt = @rt + @amt;
UPDATE #x SET rt = @rt WHERE TID = @tid;
FETCH c INTO @tid, @amt;
END

CLOSE c; DEALLOCATE c;

SELECT TID, amt, RunningTotal = rt
FROM #x
ORDER BY TID;

DROP TABLE #x;


SQL Server 2012 or above

New window functions introduced in SQL Server 2012 make this task a lot easier (and it performs better than all of the above methods as well):

SELECT TID, amt, 
RunningTotal = SUM(amt) OVER (ORDER BY TID ROWS UNBOUNDED PRECEDING)
FROM dbo.Transactions
ORDER BY TID;

Note that on larger data sets, you'll find that the above performs much better than either of the following two options, since RANGE uses an on-disk spool (and the default uses RANGE). However it is also important to note that the behavior and results can differ, so be sure they both return correct results before deciding between them based on this difference.

SELECT TID, amt, 
RunningTotal = SUM(amt) OVER (ORDER BY TID)
FROM dbo.Transactions
ORDER BY TID;

SELECT TID, amt,
RunningTotal = SUM(amt) OVER (ORDER BY TID RANGE UNBOUNDED PRECEDING)
FROM dbo.Transactions
ORDER BY TID;


CLR

For completeness, I'm offering a link to Pavel Pawlowski's CLR method, which is by far the preferable method on versions prior to SQL Server 2012 (but not 2000 obviously).

http://www.pawlowski.cz/2010/09/sql-server-and-fastest-running-totals-using-clr/



Conclusion

If you are on SQL Server 2012 or above, the choice is obvious - use the new SUM() OVER() construct (with ROWS vs. RANGE). For earlier versions, you'll want to compare the performance of the alternative approaches on your schema, data and - taking non-performance-related factors in mind - determine which approach is right for you. It very well may be the CLR approach. Here are my recommendations, in order of preference:

  1. SUM() OVER() ... ROWS, if on 2012 or above
  2. CLR method, if possible
  3. First recursive CTE method, if possible
  4. Cursor
  5. The other recursive CTE methods
  6. Quirky update
  7. Join and/or correlated subquery

For further information with performance comparisons of these methods, see this question on http://dba.stackexchange.com:

https://dba.stackexchange.com/questions/19507/running-total-with-count


I've also blogged more details about these comparisons here:

http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals


Also for grouped/partitioned running totals, see the following posts:

http://sqlperformance.com/2014/01/t-sql-queries/grouped-running-totals

Partitioning results in a running totals query

Multiple Running Totals with Group By

Query with row by row calculation for running total

Improvement on previous answer following input from Thorsten

with numbered as (
select *, ROW_NUMBER() OVER (ORDER BY [Week]) as RN
from [Data]
)
,cte as (
select [Week], [Due], [Slots], [RN]
,case when Due > Slots then Due - Slots else 0 end as [Total]
from numbered
where RN = 1

union all

select e.[Week], e.[Due], e.[Slots], e.[RN]
, case when cte.Total + e.Due - e.Slots > 0 then cte.Total + e.Due - e.Slots else 0 end as [Total]
from numbered e
inner join cte on cte.[RN] = e.[RN] - 1
)

select * from cte

OPTION (MAXRECURSION 0)

Many thanks Thorsten for all your help.

Calculate a Running Total in SQL Server

Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947

The problem is that the SQL Server implementation of the Over clause is somewhat limited.

Oracle (and ANSI-SQL) allow you to do things like:

 SELECT somedate, somevalue,
SUM(somevalue) OVER(ORDER BY somedate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM Table

SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.

The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will probably be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):

Full working sample:

drop table #t 
create table #t ( ord int primary key, total int, running_total int)

insert #t(ord,total) values (2,20)
-- notice the malicious re-ordering
insert #t(ord,total) values (1,10)
insert #t(ord,total) values (3,10)
insert #t(ord,total) values (4,1)

declare @total int
set @total = 0
update #t set running_total = @total, @total = @total + total

select * from #t
order by ord

ord total running_total
----------- ----------- -------------
1 10 10
2 20 30
3 10 40
4 1 41

You asked for a benchmark this is the lowdown.

The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.

The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.

Bottom line, for production code I would go with the cursor.

Test data:

create table #t ( ord int primary key, total int, running_total int)

set nocount on
declare @i int
set @i = 0
begin tran
while @i < 10000
begin
insert #t (ord, total) values (@i, rand() * 100)
set @i = @i +1
end
commit

Test 1:

SELECT ord,total, 
(SELECT SUM(total)
FROM #t b
WHERE b.ord <= a.ord) AS b
FROM #t a

-- CPU 11731, Reads 154934, Duration 11135

Test 2:

SELECT a.ord, a.total, SUM(b.total) AS RunningTotal 
FROM #t a CROSS JOIN #t b
WHERE (b.ord <= a.ord)
GROUP BY a.ord,a.total
ORDER BY a.ord

-- CPU 16053, Reads 154935, Duration 4647

Test 3:

DECLARE @TotalTable table(ord int primary key, total int, running_total int)

DECLARE forward_cursor CURSOR FAST_FORWARD
FOR
SELECT ord, total
FROM #t
ORDER BY ord


OPEN forward_cursor

DECLARE @running_total int,
@ord int,
@total int
SET @running_total = 0

FETCH NEXT FROM forward_cursor INTO @ord, @total
WHILE (@@FETCH_STATUS = 0)
BEGIN
SET @running_total = @running_total + @total
INSERT @TotalTable VALUES(@ord, @total, @running_total)
FETCH NEXT FROM forward_cursor INTO @ord, @total
END

CLOSE forward_cursor
DEALLOCATE forward_cursor

SELECT * FROM @TotalTable

-- CPU 359, Reads 30392, Duration 496

Test 4:

declare @total int 
set @total = 0
update #t set running_total = @total, @total = @total + total

select * from #t

-- CPU 0, Reads 58, Duration 139

Order by and apply a running total to the same column without using a temporary table

Following Aaron Bertrand's suggestion of using a cursor method :

DECLARE @st TABLE
(
Id Int PRIMARY KEY,
SaleAmount Numeric(10,2),
RunningTotal Numeric(10,2)
);

DECLARE
@Id INT,
@SaleAmount Numeric(10,2),
@RunningTotal Numeric(10,2) = 0;

DECLARE c CURSOR
LOCAL STATIC FORWARD_ONLY READ_ONLY
FOR
SELECT id, SaleAmount
FROM Sales
ORDER BY SaleAmount;

OPEN c;

FETCH NEXT FROM c INTO @Id, @SaleAmount;

WHILE @@FETCH_STATUS = 0
BEGIN
SET @RunningTotal = @RunningTotal + @SaleAmount;

INSERT @st(Id, SaleAmount, RunningTotal)
SELECT @Id, @SaleAmount, @RunningTotal;

FETCH NEXT FROM c INTO @Id, @SaleAmount;
END

CLOSE c;
DEALLOCATE c;

SELECT Id, SaleAmount, RunningTotal
FROM @st
WHERE RunningTotal<=10
ORDER BY SaleAmount;

This is an increase in code and still requires a table variable. However the improvement in performance is significant.

Credit has to go to Aaron Bertrand for the excellent article on running totals he wrote.

Select sum of a column without cursor

Okay, I did this more for the challenge than because I necessarily think it's a good idea. I'm inclined to believe Aaron that a cursor may be more appropriate. Anyhow:

declare @Items table (ID int not null,LENGTH_IN_CM decimal(5,1) not null)
insert into @Items(ID,LENGTH_IN_CM) values
(1,1.0),
(2,1.0),
(3,9.0),
(4,5.0),
(5,15.0),
(6,3.0),
(7,6.0)

;With PossiblePages as (
select ID as MinID,ID as MaxID,LENGTH_IN_CM as TotalLength from @Items
union all
select MinID,MaxID + 1,CONVERT(decimal(5,1),TotalLength + LENGTH_IN_CM)
from
PossiblePages pp
inner join
@Items it
on
pp.MaxID + 1 = it.ID
where
TotalLength + LENGTH_IN_CM <= 20.0
), LongPages as (
select MinID,MAX(MaxID) as MaxID,MAX(TotalLength) as TotalLength from PossiblePages group by MinID
), FinalPages as (
select MinID,MaxID,TotalLength from LongPages where MinID = 1
union all
select lp.MinID,lp.MaxID,lp.TotalLength
from
LongPages lp
inner join
FinalPages fp
on
lp.MinID = fp.MaxID + 1
), PageNumbers as (
select MinID,MaxID,ROW_NUMBER() OVER (ORDER BY MinID) as PageNo
from FinalPages
)
select * from PageNumbers

Result:

MinID       MaxID       PageNo
----------- ----------- --------------------
1 4 1
5 6 2
7 7 3

Which should be easy enough to join back to the original table, if you want to assign a page number to each row.

PossiblePages calculates every possible page - for every row, it acts as if that row is the first one on that page, and then works out how many additional rows can be appended to that, and the total length that that range of rows represents (there may be cleaner ways to calculate this expression, not sure at the moment).

LongPages then finds the longest value that PossiblePages calculated, for each starting row number.

Finally, FinalPages starts with the first page (that must, logically, be the one started with ID 1 - you can always introduce another calculation if you're not guaranteed to start at 1, and need to find the earliest). Then, the next page is the one that starts from the row ID one higher than the previous row.

You don't need PageNumbers, but as I said, I was thinking of joining back to the original table.


And as predicted by the commenters, I don't think this is going to perform well - just on the sample, I'm seeing at least 4 table scans to compute this.


Bonus insanity. This one only scans the table 3 times:

;With PageRows as (
select ID as MinID,ID as MaxID,LENGTH_IN_CM as TotalLength from @Items where ID=1
union all
select MinID,MaxID + 1,CONVERT(decimal(5,1),TotalLength + LENGTH_IN_CM)
from
PageRows pr
inner join
@Items ir
on
pr.MaxID = ir.ID-1
where
TotalLength + LENGTH_IN_CM <= 20.0
union all
select ir.ID as MinID,ir.ID as MaxID,ir.LENGTH_IN_CM as TotalLength
from
PageRows pr
inner join
@Items ir
on
pr.MaxID = ir.ID-1
where
TotalLength + LENGTH_IN_CM > 20.0
), PageNumbers as (
select MinID,MAX(MaxID) as MaxID,ROW_NUMBER() OVER (ORDER BY MinID) as PageNo
from PageRows
group by MinID
)
select * from PageNumbers


Related Topics



Leave a reply



Submit