Calculate a Running Total in SQL Server

Calculate a Running Total in SQL Server

Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947

The problem is that the SQL Server implementation of the Over clause is somewhat limited.

Oracle (and ANSI-SQL) allow you to do things like:

 SELECT somedate, somevalue,
SUM(somevalue) OVER(ORDER BY somedate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM Table

SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.

The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will probably be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):

Full working sample:

drop table #t 
create table #t ( ord int primary key, total int, running_total int)

insert #t(ord,total) values (2,20)
-- notice the malicious re-ordering
insert #t(ord,total) values (1,10)
insert #t(ord,total) values (3,10)
insert #t(ord,total) values (4,1)

declare @total int
set @total = 0
update #t set running_total = @total, @total = @total + total

select * from #t
order by ord

ord total running_total
----------- ----------- -------------
1 10 10
2 20 30
3 10 40
4 1 41

You asked for a benchmark this is the lowdown.

The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.

The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.

Bottom line, for production code I would go with the cursor.

Test data:

create table #t ( ord int primary key, total int, running_total int)

set nocount on
declare @i int
set @i = 0
begin tran
while @i < 10000
begin
insert #t (ord, total) values (@i, rand() * 100)
set @i = @i +1
end
commit

Test 1:

SELECT ord,total, 
(SELECT SUM(total)
FROM #t b
WHERE b.ord <= a.ord) AS b
FROM #t a

-- CPU 11731, Reads 154934, Duration 11135

Test 2:

SELECT a.ord, a.total, SUM(b.total) AS RunningTotal 
FROM #t a CROSS JOIN #t b
WHERE (b.ord <= a.ord)
GROUP BY a.ord,a.total
ORDER BY a.ord

-- CPU 16053, Reads 154935, Duration 4647

Test 3:

DECLARE @TotalTable table(ord int primary key, total int, running_total int)

DECLARE forward_cursor CURSOR FAST_FORWARD
FOR
SELECT ord, total
FROM #t
ORDER BY ord


OPEN forward_cursor

DECLARE @running_total int,
@ord int,
@total int
SET @running_total = 0

FETCH NEXT FROM forward_cursor INTO @ord, @total
WHILE (@@FETCH_STATUS = 0)
BEGIN
SET @running_total = @running_total + @total
INSERT @TotalTable VALUES(@ord, @total, @running_total)
FETCH NEXT FROM forward_cursor INTO @ord, @total
END

CLOSE forward_cursor
DEALLOCATE forward_cursor

SELECT * FROM @TotalTable

-- CPU 359, Reads 30392, Duration 496

Test 4:

declare @total int 
set @total = 0
update #t set running_total = @total, @total = @total + total

select * from #t

-- CPU 0, Reads 58, Duration 139

Fastestest way of calculating Running Totals in SQL

Unless your table has so few rows that you can count them on one hand, the built-in window function is going to be faster.

Why? First, it is built-in and designed for this purpose.

Second, it actually calculates a cumulative sum. So, when calculating the 10th value, it is using the result from the 9th cumulative sum and just adding one more value to it.

The join approach is a particularly egregious way of doing the calculation. If you have 100 rows in the table, then it expands the table to 100 * 99 / 2 rows (give or take) -- and then has to aggregate way more than 100 rows. You can see that this just gets worse as your table gets bigger.

That said, if you have three rows in the table, you might find that the join works better. Such is the law of larger numbers: sometimes surprising things happen on smaller numbers.

Calculate running total / running balance

For those not using SQL Server 2012 or above, a cursor is likely the most efficient supported and guaranteed method outside of CLR. There are other approaches such as the "quirky update" which can be marginally faster but not guaranteed to work in the future, and of course set-based approaches with hyperbolic performance profiles as the table gets larger, and recursive CTE methods that often require direct #tempdb I/O or result in spills that yield roughly the same impact.



INNER JOIN - do not do this:

The slow, set-based approach is of the form:

SELECT t1.TID, t1.amt, RunningTotal = SUM(t2.amt)
FROM dbo.Transactions AS t1
INNER JOIN dbo.Transactions AS t2
ON t1.TID >= t2.TID
GROUP BY t1.TID, t1.amt
ORDER BY t1.TID;

The reason this is slow? As the table gets larger, each incremental row requires reading n-1 rows in the table. This is exponential and bound for failures, timeouts, or just angry users.



Correlated subquery - do not do this either:

The subquery form is similarly painful for similarly painful reasons.

SELECT TID, amt, RunningTotal = amt + COALESCE(
(
SELECT SUM(amt)
FROM dbo.Transactions AS i
WHERE i.TID < o.TID), 0
)
FROM dbo.Transactions AS o
ORDER BY TID;


Quirky update - do this at your own risk:

The "quirky update" method is more efficient than the above, but the behavior is not documented, there are no guarantees about order, and the behavior might work today but could break in the future. I'm including this because it is a popular method and it is efficient, but that doesn't mean I endorse it. The primary reason I even answered this question instead of closing it as a duplicate is because the other question has a quirky update as the accepted answer.

DECLARE @t TABLE
(
TID INT PRIMARY KEY,
amt INT,
RunningTotal INT
);

DECLARE @RunningTotal INT = 0;

INSERT @t(TID, amt, RunningTotal)
SELECT TID, amt, RunningTotal = 0
FROM dbo.Transactions
ORDER BY TID;

UPDATE @t
SET @RunningTotal = RunningTotal = @RunningTotal + amt
FROM @t;

SELECT TID, amt, RunningTotal
FROM @t
ORDER BY TID;


Recursive CTEs

This first one relies on TID to be contiguous, no gaps:

;WITH x AS
(
SELECT TID, amt, RunningTotal = amt
FROM dbo.Transactions
WHERE TID = 1
UNION ALL
SELECT y.TID, y.amt, x.RunningTotal + y.amt
FROM x
INNER JOIN dbo.Transactions AS y
ON y.TID = x.TID + 1
)
SELECT TID, amt, RunningTotal
FROM x
ORDER BY TID
OPTION (MAXRECURSION 10000);

If you can't rely on this, then you can use this variation, which simply builds a contiguous sequence using ROW_NUMBER():

;WITH y AS 
(
SELECT TID, amt, rn = ROW_NUMBER() OVER (ORDER BY TID)
FROM dbo.Transactions
), x AS
(
SELECT TID, rn, amt, rt = amt
FROM y
WHERE rn = 1
UNION ALL
SELECT y.TID, y.rn, y.amt, x.rt + y.amt
FROM x INNER JOIN y
ON y.rn = x.rn + 1
)
SELECT TID, amt, RunningTotal = rt
FROM x
ORDER BY x.rn
OPTION (MAXRECURSION 10000);

Depending on the size of the data (e.g. columns we don't know about), you may find better overall performance by stuffing the relevant columns only in a #temp table first, and processing against that instead of the base table:

CREATE TABLE #x
(
rn INT PRIMARY KEY,
TID INT,
amt INT
);

INSERT INTO #x (rn, TID, amt)
SELECT ROW_NUMBER() OVER (ORDER BY TID),
TID, amt
FROM dbo.Transactions;

;WITH x AS
(
SELECT TID, rn, amt, rt = amt
FROM #x
WHERE rn = 1
UNION ALL
SELECT y.TID, y.rn, y.amt, x.rt + y.amt
FROM x INNER JOIN #x AS y
ON y.rn = x.rn + 1
)
SELECT TID, amt, RunningTotal = rt
FROM x
ORDER BY TID
OPTION (MAXRECURSION 10000);

DROP TABLE #x;

Only the first CTE method will provide performance rivaling the quirky update, but it makes a big assumption about the nature of the data (no gaps). The other two methods will fall back and in those cases you may as well use a cursor (if you can't use CLR and you're not yet on SQL Server 2012 or above).



Cursor

Everybody is told that cursors are evil, and that they should be avoided at all costs, but this actually beats the performance of most other supported methods, and is safer than the quirky update. The only ones I prefer over the cursor solution are the 2012 and CLR methods (below):

CREATE TABLE #x
(
TID INT PRIMARY KEY,
amt INT,
rt INT
);

INSERT #x(TID, amt)
SELECT TID, amt
FROM dbo.Transactions
ORDER BY TID;

DECLARE @rt INT, @tid INT, @amt INT;
SET @rt = 0;

DECLARE c CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR SELECT TID, amt FROM #x ORDER BY TID;

OPEN c;

FETCH c INTO @tid, @amt;

WHILE @@FETCH_STATUS = 0
BEGIN
SET @rt = @rt + @amt;
UPDATE #x SET rt = @rt WHERE TID = @tid;
FETCH c INTO @tid, @amt;
END

CLOSE c; DEALLOCATE c;

SELECT TID, amt, RunningTotal = rt
FROM #x
ORDER BY TID;

DROP TABLE #x;


SQL Server 2012 or above

New window functions introduced in SQL Server 2012 make this task a lot easier (and it performs better than all of the above methods as well):

SELECT TID, amt, 
RunningTotal = SUM(amt) OVER (ORDER BY TID ROWS UNBOUNDED PRECEDING)
FROM dbo.Transactions
ORDER BY TID;

Note that on larger data sets, you'll find that the above performs much better than either of the following two options, since RANGE uses an on-disk spool (and the default uses RANGE). However it is also important to note that the behavior and results can differ, so be sure they both return correct results before deciding between them based on this difference.

SELECT TID, amt, 
RunningTotal = SUM(amt) OVER (ORDER BY TID)
FROM dbo.Transactions
ORDER BY TID;

SELECT TID, amt,
RunningTotal = SUM(amt) OVER (ORDER BY TID RANGE UNBOUNDED PRECEDING)
FROM dbo.Transactions
ORDER BY TID;


CLR

For completeness, I'm offering a link to Pavel Pawlowski's CLR method, which is by far the preferable method on versions prior to SQL Server 2012 (but not 2000 obviously).

http://www.pawlowski.cz/2010/09/sql-server-and-fastest-running-totals-using-clr/



Conclusion

If you are on SQL Server 2012 or above, the choice is obvious - use the new SUM() OVER() construct (with ROWS vs. RANGE). For earlier versions, you'll want to compare the performance of the alternative approaches on your schema, data and - taking non-performance-related factors in mind - determine which approach is right for you. It very well may be the CLR approach. Here are my recommendations, in order of preference:

  1. SUM() OVER() ... ROWS, if on 2012 or above
  2. CLR method, if possible
  3. First recursive CTE method, if possible
  4. Cursor
  5. The other recursive CTE methods
  6. Quirky update
  7. Join and/or correlated subquery

For further information with performance comparisons of these methods, see this question on http://dba.stackexchange.com:

https://dba.stackexchange.com/questions/19507/running-total-with-count


I've also blogged more details about these comparisons here:

http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals


Also for grouped/partitioned running totals, see the following posts:

http://sqlperformance.com/2014/01/t-sql-queries/grouped-running-totals

Partitioning results in a running totals query

Multiple Running Totals with Group By

Sum of total by date with running total

Here's a version that squeezes the size of the code down compared to other examples. It should also be a little faster than some as there's no sub queries forcing multiple reads of the table. The trick to remember when using running totals and a group by statement is that you don't apply the window function to the raw column but to an aggregate of the raw column (hence I nest a SUM inside a SUM).

SELECT 
CAST(OrderCheckOutDate AS date) AS OrderCheckOutDate,
SUM (OrderTotalFixed) AS DayTotal,
SUM (SUM (OrderTotalFixed)) OVER (ORDER BY CAST(OrderCheckOutDate AS date)) AS RunningTotal
FROM
tblOrder
Where
UserExternalUserLevel = 0
And
OrderCheckOutDate > '2019-11-01'
GROUP BY CAST(OrderCheckOutDate AS date)

You can see this working at dbfiddle

If your actual use case is more complicated for example you want the running total to reset each month that can be handled by adding PARTITION in to the window function.

Partitioning by month will look something like this:

SELECT 
CAST(OrderCheckOutDate AS date) AS OrderCheckOutDate,
EOMONTH(CAST(OrderCheckOutDate AS date)) AS MonthEnd,
SUM (OrderTotalFixed) AS DayTotal,
SUM (SUM (OrderTotalFixed)) OVER (PARTITION BY EOMONTH(CAST(OrderCheckOutDate AS date)) ORDER BY CAST(OrderCheckOutDate AS date) ) AS RunningTotal
FROM
tblOrder
Where
UserExternalUserLevel = 0
And
OrderCheckOutDate > '2019-11-01'
GROUP BY CAST(OrderCheckOutDate AS date)

And again see it in action with dbfiddle

Calculate Running total over calculated column SQL Server

If using SQL Server 2012+, using SUM() with the OVER() Clause would allow you to achieve a running total.

For example,

SELECT TimeElapsed,
CurrentValue,
CummuSum = SUM(CurrentValue) OVER (ORDER BY TimeElapsed)
FROM
(
SELECT TimeElapsed = DATEDIFF(HOUR, Date1, Date2),
CurrentValue = CAST(COUNT(Date2) AS FLOAT) / (SELECT COUNT(Col3) FROM myTable)
FROM myTable
GROUP BY DATEDIFF(HOUR, Date1, Date2)
) T
ORDER BY TimeElapsed;

An alternative way of achieving this (which would work in SQL Server 2008), is using a CTE with a subquery in the select statement.

For example,

WITH CTE AS
(
SELECT TimeElapsed = DATEDIFF(HOUR, Date1, Date2),
CurrentValue = CAST(COUNT(Date2) AS FLOAT) / (SELECT COUNT(Col3) FROM myTable)
FROM myTable
GROUP BY DATEDIFF(HOUR, Date1, Date2)
)
SELECT TimeElapsed,
CurrentValue,
CummuSum = (SELECT SUM(CurrentValue) FROM CTE WHERE TimeElapsed <= C.TimeElapsed)
FROM CTE C;

Running total for SQL column that is part of subquery

You have some syntax errors, as well as some improvements to make:

  • As mentioned in the other answer, the derived table needs an alias
  • Use [] to quote column names if necessary (preferably don't have such column names in the first place).
  • You cannot have ORDER BY inside a derived table or subquery, nor does it make sense to do so.
  • You can replace the SELECT SUM subquery with SUM(SUM) OVER () window function.
  • The running total window function must have ROWS UNBOUNDED PRECEDING if TimePeriod may have duplicates. It's also faster.
  • Because you are ordering by TimePeriod DESC, it may be faster to do the running total in the same order as the main ORDER BY but with ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING which works out to the same results. The benefit is one less sort in the query plan.
SELECT
TimePeriod,
[Total Per Period],
[Percentage of Total],
SUM([Percentage of Total]) OVER (ORDER BY TimePeriod DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM
(SELECT
TimePeriod,
SUM(Amount) AS [Total Per Period],
CAST( ROUND( SUM(Amount) / SUM(SUM(Amount)) OVER () , 3) AS DECIMAL(12,3)) AS [Percentage of Total]
FROM
MyDatabase.MyTable
GROUP BY
TimePeriod
) t
ORDER BY
TimePeriod DESC;

A further improvement would be to combine the whole thing into one level:

SELECT
TimePeriod,
SUM(Amount) AS [Total Per Period],
CAST( ROUND(
SUM(Amount) /
SUM(SUM(Amount)) OVER ()
, 3) AS DECIMAL(12,3)) AS [Percentage of Total],
CAST( ROUND(
SUM(SUM(Amount)) OVER (ORDER BY TimePeriod DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) /
SUM(SUM(Amount)) OVER ()
, 3) AS DECIMAL(12,3))
FROM
MyDatabase.MyTable
GROUP BY
TimePeriod
ORDER BY
TimePeriod DESC;

Note that the results may be slightly different due to rounding.

SQL Fiddle



Related Topics



Leave a reply



Submit