How do I calculate a running total in SQL without using a cursor?
You might want to take a look at the update to local variable solution here: http://geekswithblogs.net/Rhames/archive/2008/10/28/calculating-running-totals-in-sql-server-2005---the-optimal.aspx
DECLARE @SalesTbl TABLE (DayCount smallint, Sales money, RunningTotal money)
DECLARE @RunningTotal money
SET @RunningTotal = 0
INSERT INTO @SalesTbl
SELECT DayCount, Sales, null
FROM Sales
ORDER BY DayCount
UPDATE @SalesTbl
SET @RunningTotal = RunningTotal = @RunningTotal + Sales
FROM @SalesTbl
SELECT * FROM @SalesTbl
Outperforms all other methods, but has some doubts about guaranteed row order. Seems to work fine when temp table is indexed though..
- Nested sub-query 9300 ms
- Self join 6100 ms
- Cursor 400 ms
- Update to local variable 140 ms
How could I re-write this cursor to calculate running total?
An example with rows unbounded preceding:
SELECT TOP 1000 [Dato]
,[Department]
,[Amount]
,runningtotal = SUM(amount) over(order by dato ROWS UNBOUNDED PRECEDING)
FROM [LegOgSpass].[dbo].[amounts]
Result
Calculate running total / running balance
For those not using SQL Server 2012 or above, a cursor is likely the most efficient supported and guaranteed method outside of CLR. There are other approaches such as the "quirky update" which can be marginally faster but not guaranteed to work in the future, and of course set-based approaches with hyperbolic performance profiles as the table gets larger, and recursive CTE methods that often require direct #tempdb I/O or result in spills that yield roughly the same impact.
INNER JOIN - do not do this:
The slow, set-based approach is of the form:
SELECT t1.TID, t1.amt, RunningTotal = SUM(t2.amt)
FROM dbo.Transactions AS t1
INNER JOIN dbo.Transactions AS t2
ON t1.TID >= t2.TID
GROUP BY t1.TID, t1.amt
ORDER BY t1.TID;
The reason this is slow? As the table gets larger, each incremental row requires reading n-1 rows in the table. This is exponential and bound for failures, timeouts, or just angry users.
Correlated subquery - do not do this either:
The subquery form is similarly painful for similarly painful reasons.
SELECT TID, amt, RunningTotal = amt + COALESCE(
(
SELECT SUM(amt)
FROM dbo.Transactions AS i
WHERE i.TID < o.TID), 0
)
FROM dbo.Transactions AS o
ORDER BY TID;
Quirky update - do this at your own risk:
The "quirky update" method is more efficient than the above, but the behavior is not documented, there are no guarantees about order, and the behavior might work today but could break in the future. I'm including this because it is a popular method and it is efficient, but that doesn't mean I endorse it. The primary reason I even answered this question instead of closing it as a duplicate is because the other question has a quirky update as the accepted answer.
DECLARE @t TABLE
(
TID INT PRIMARY KEY,
amt INT,
RunningTotal INT
);
DECLARE @RunningTotal INT = 0;
INSERT @t(TID, amt, RunningTotal)
SELECT TID, amt, RunningTotal = 0
FROM dbo.Transactions
ORDER BY TID;
UPDATE @t
SET @RunningTotal = RunningTotal = @RunningTotal + amt
FROM @t;
SELECT TID, amt, RunningTotal
FROM @t
ORDER BY TID;
Recursive CTEs
This first one relies on TID to be contiguous, no gaps:
;WITH x AS
(
SELECT TID, amt, RunningTotal = amt
FROM dbo.Transactions
WHERE TID = 1
UNION ALL
SELECT y.TID, y.amt, x.RunningTotal + y.amt
FROM x
INNER JOIN dbo.Transactions AS y
ON y.TID = x.TID + 1
)
SELECT TID, amt, RunningTotal
FROM x
ORDER BY TID
OPTION (MAXRECURSION 10000);
If you can't rely on this, then you can use this variation, which simply builds a contiguous sequence using ROW_NUMBER()
:
;WITH y AS
(
SELECT TID, amt, rn = ROW_NUMBER() OVER (ORDER BY TID)
FROM dbo.Transactions
), x AS
(
SELECT TID, rn, amt, rt = amt
FROM y
WHERE rn = 1
UNION ALL
SELECT y.TID, y.rn, y.amt, x.rt + y.amt
FROM x INNER JOIN y
ON y.rn = x.rn + 1
)
SELECT TID, amt, RunningTotal = rt
FROM x
ORDER BY x.rn
OPTION (MAXRECURSION 10000);
Depending on the size of the data (e.g. columns we don't know about), you may find better overall performance by stuffing the relevant columns only in a #temp table first, and processing against that instead of the base table:
CREATE TABLE #x
(
rn INT PRIMARY KEY,
TID INT,
amt INT
);
INSERT INTO #x (rn, TID, amt)
SELECT ROW_NUMBER() OVER (ORDER BY TID),
TID, amt
FROM dbo.Transactions;
;WITH x AS
(
SELECT TID, rn, amt, rt = amt
FROM #x
WHERE rn = 1
UNION ALL
SELECT y.TID, y.rn, y.amt, x.rt + y.amt
FROM x INNER JOIN #x AS y
ON y.rn = x.rn + 1
)
SELECT TID, amt, RunningTotal = rt
FROM x
ORDER BY TID
OPTION (MAXRECURSION 10000);
DROP TABLE #x;
Only the first CTE method will provide performance rivaling the quirky update, but it makes a big assumption about the nature of the data (no gaps). The other two methods will fall back and in those cases you may as well use a cursor (if you can't use CLR and you're not yet on SQL Server 2012 or above).
Cursor
Everybody is told that cursors are evil, and that they should be avoided at all costs, but this actually beats the performance of most other supported methods, and is safer than the quirky update. The only ones I prefer over the cursor solution are the 2012 and CLR methods (below):
CREATE TABLE #x
(
TID INT PRIMARY KEY,
amt INT,
rt INT
);
INSERT #x(TID, amt)
SELECT TID, amt
FROM dbo.Transactions
ORDER BY TID;
DECLARE @rt INT, @tid INT, @amt INT;
SET @rt = 0;
DECLARE c CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR SELECT TID, amt FROM #x ORDER BY TID;
OPEN c;
FETCH c INTO @tid, @amt;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @rt = @rt + @amt;
UPDATE #x SET rt = @rt WHERE TID = @tid;
FETCH c INTO @tid, @amt;
END
CLOSE c; DEALLOCATE c;
SELECT TID, amt, RunningTotal = rt
FROM #x
ORDER BY TID;
DROP TABLE #x;
SQL Server 2012 or above
New window functions introduced in SQL Server 2012 make this task a lot easier (and it performs better than all of the above methods as well):
SELECT TID, amt,
RunningTotal = SUM(amt) OVER (ORDER BY TID ROWS UNBOUNDED PRECEDING)
FROM dbo.Transactions
ORDER BY TID;
Note that on larger data sets, you'll find that the above performs much better than either of the following two options, since RANGE uses an on-disk spool (and the default uses RANGE). However it is also important to note that the behavior and results can differ, so be sure they both return correct results before deciding between them based on this difference.
SELECT TID, amt,
RunningTotal = SUM(amt) OVER (ORDER BY TID)
FROM dbo.Transactions
ORDER BY TID;
SELECT TID, amt,
RunningTotal = SUM(amt) OVER (ORDER BY TID RANGE UNBOUNDED PRECEDING)
FROM dbo.Transactions
ORDER BY TID;
CLR
For completeness, I'm offering a link to Pavel Pawlowski's CLR method, which is by far the preferable method on versions prior to SQL Server 2012 (but not 2000 obviously).
http://www.pawlowski.cz/2010/09/sql-server-and-fastest-running-totals-using-clr/
Conclusion
If you are on SQL Server 2012 or above, the choice is obvious - use the new SUM() OVER()
construct (with ROWS
vs. RANGE
). For earlier versions, you'll want to compare the performance of the alternative approaches on your schema, data and - taking non-performance-related factors in mind - determine which approach is right for you. It very well may be the CLR approach. Here are my recommendations, in order of preference:
SUM() OVER() ... ROWS
, if on 2012 or above- CLR method, if possible
- First recursive CTE method, if possible
- Cursor
- The other recursive CTE methods
- Quirky update
- Join and/or correlated subquery
For further information with performance comparisons of these methods, see this question on http://dba.stackexchange.com:
https://dba.stackexchange.com/questions/19507/running-total-with-count
I've also blogged more details about these comparisons here:
http://www.sqlperformance.com/2012/07/t-sql-queries/running-totals
Also for grouped/partitioned running totals, see the following posts:
http://sqlperformance.com/2014/01/t-sql-queries/grouped-running-totals
Partitioning results in a running totals query
Multiple Running Totals with Group By
Query with row by row calculation for running total
Improvement on previous answer following input from Thorsten
with numbered as (
select *, ROW_NUMBER() OVER (ORDER BY [Week]) as RN
from [Data]
)
,cte as (
select [Week], [Due], [Slots], [RN]
,case when Due > Slots then Due - Slots else 0 end as [Total]
from numbered
where RN = 1
union all
select e.[Week], e.[Due], e.[Slots], e.[RN]
, case when cte.Total + e.Due - e.Slots > 0 then cte.Total + e.Due - e.Slots else 0 end as [Total]
from numbered e
inner join cte on cte.[RN] = e.[RN] - 1
)
select * from cte
OPTION (MAXRECURSION 0)
Many thanks Thorsten for all your help.
Calculate a Running Total in SQL Server
Update, if you are running SQL Server 2012 see: https://stackoverflow.com/a/10309947
The problem is that the SQL Server implementation of the Over clause is somewhat limited.
Oracle (and ANSI-SQL) allow you to do things like:
SELECT somedate, somevalue,
SUM(somevalue) OVER(ORDER BY somedate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM Table
SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.
The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will probably
be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):
Full working sample:
drop table #t
create table #t ( ord int primary key, total int, running_total int)
insert #t(ord,total) values (2,20)
-- notice the malicious re-ordering
insert #t(ord,total) values (1,10)
insert #t(ord,total) values (3,10)
insert #t(ord,total) values (4,1)
declare @total int
set @total = 0
update #t set running_total = @total, @total = @total + total
select * from #t
order by ord
ord total running_total
----------- ----------- -------------
1 10 10
2 20 30
3 10 40
4 1 41
You asked for a benchmark this is the lowdown.
The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.
The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.
Bottom line, for production code I would go with the cursor.
Test data:
create table #t ( ord int primary key, total int, running_total int)
set nocount on
declare @i int
set @i = 0
begin tran
while @i < 10000
begin
insert #t (ord, total) values (@i, rand() * 100)
set @i = @i +1
end
commit
Test 1:
SELECT ord,total,
(SELECT SUM(total)
FROM #t b
WHERE b.ord <= a.ord) AS b
FROM #t a
-- CPU 11731, Reads 154934, Duration 11135
Test 2:
SELECT a.ord, a.total, SUM(b.total) AS RunningTotal
FROM #t a CROSS JOIN #t b
WHERE (b.ord <= a.ord)
GROUP BY a.ord,a.total
ORDER BY a.ord
-- CPU 16053, Reads 154935, Duration 4647
Test 3:
DECLARE @TotalTable table(ord int primary key, total int, running_total int)
DECLARE forward_cursor CURSOR FAST_FORWARD
FOR
SELECT ord, total
FROM #t
ORDER BY ord
OPEN forward_cursor
DECLARE @running_total int,
@ord int,
@total int
SET @running_total = 0
FETCH NEXT FROM forward_cursor INTO @ord, @total
WHILE (@@FETCH_STATUS = 0)
BEGIN
SET @running_total = @running_total + @total
INSERT @TotalTable VALUES(@ord, @total, @running_total)
FETCH NEXT FROM forward_cursor INTO @ord, @total
END
CLOSE forward_cursor
DEALLOCATE forward_cursor
SELECT * FROM @TotalTable
-- CPU 359, Reads 30392, Duration 496
Test 4:
declare @total int
set @total = 0
update #t set running_total = @total, @total = @total + total
select * from #t
-- CPU 0, Reads 58, Duration 139
Order by and apply a running total to the same column without using a temporary table
Following Aaron Bertrand's suggestion of using a cursor method :
DECLARE @st TABLE
(
Id Int PRIMARY KEY,
SaleAmount Numeric(10,2),
RunningTotal Numeric(10,2)
);
DECLARE
@Id INT,
@SaleAmount Numeric(10,2),
@RunningTotal Numeric(10,2) = 0;
DECLARE c CURSOR
LOCAL STATIC FORWARD_ONLY READ_ONLY
FOR
SELECT id, SaleAmount
FROM Sales
ORDER BY SaleAmount;
OPEN c;
FETCH NEXT FROM c INTO @Id, @SaleAmount;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @RunningTotal = @RunningTotal + @SaleAmount;
INSERT @st(Id, SaleAmount, RunningTotal)
SELECT @Id, @SaleAmount, @RunningTotal;
FETCH NEXT FROM c INTO @Id, @SaleAmount;
END
CLOSE c;
DEALLOCATE c;
SELECT Id, SaleAmount, RunningTotal
FROM @st
WHERE RunningTotal<=10
ORDER BY SaleAmount;
This is an increase in code and still requires a table variable. However the improvement in performance is significant.
Credit has to go to Aaron Bertrand for the excellent article on running totals he wrote.
Select sum of a column without cursor
Okay, I did this more for the challenge than because I necessarily think it's a good idea. I'm inclined to believe Aaron that a cursor may be more appropriate. Anyhow:
declare @Items table (ID int not null,LENGTH_IN_CM decimal(5,1) not null)
insert into @Items(ID,LENGTH_IN_CM) values
(1,1.0),
(2,1.0),
(3,9.0),
(4,5.0),
(5,15.0),
(6,3.0),
(7,6.0)
;With PossiblePages as (
select ID as MinID,ID as MaxID,LENGTH_IN_CM as TotalLength from @Items
union all
select MinID,MaxID + 1,CONVERT(decimal(5,1),TotalLength + LENGTH_IN_CM)
from
PossiblePages pp
inner join
@Items it
on
pp.MaxID + 1 = it.ID
where
TotalLength + LENGTH_IN_CM <= 20.0
), LongPages as (
select MinID,MAX(MaxID) as MaxID,MAX(TotalLength) as TotalLength from PossiblePages group by MinID
), FinalPages as (
select MinID,MaxID,TotalLength from LongPages where MinID = 1
union all
select lp.MinID,lp.MaxID,lp.TotalLength
from
LongPages lp
inner join
FinalPages fp
on
lp.MinID = fp.MaxID + 1
), PageNumbers as (
select MinID,MaxID,ROW_NUMBER() OVER (ORDER BY MinID) as PageNo
from FinalPages
)
select * from PageNumbers
Result:
MinID MaxID PageNo
----------- ----------- --------------------
1 4 1
5 6 2
7 7 3
Which should be easy enough to join back to the original table, if you want to assign a page number to each row.
PossiblePages
calculates every possible page - for every row, it acts as if that row is the first one on that page, and then works out how many additional rows can be appended to that, and the total length that that range of rows represents (there may be cleaner ways to calculate this expression, not sure at the moment).
LongPages
then finds the longest value that PossiblePages
calculated, for each starting row number.
Finally, FinalPages
starts with the first page (that must, logically, be the one started with ID
1 - you can always introduce another calculation if you're not guaranteed to start at 1, and need to find the earliest). Then, the next page is the one that starts from the row ID one higher than the previous row.
You don't need PageNumbers
, but as I said, I was thinking of joining back to the original table.
And as predicted by the commenters, I don't think this is going to perform well - just on the sample, I'm seeing at least 4 table scans to compute this.
Bonus insanity. This one only scans the table 3 times:
;With PageRows as (
select ID as MinID,ID as MaxID,LENGTH_IN_CM as TotalLength from @Items where ID=1
union all
select MinID,MaxID + 1,CONVERT(decimal(5,1),TotalLength + LENGTH_IN_CM)
from
PageRows pr
inner join
@Items ir
on
pr.MaxID = ir.ID-1
where
TotalLength + LENGTH_IN_CM <= 20.0
union all
select ir.ID as MinID,ir.ID as MaxID,ir.LENGTH_IN_CM as TotalLength
from
PageRows pr
inner join
@Items ir
on
pr.MaxID = ir.ID-1
where
TotalLength + LENGTH_IN_CM > 20.0
), PageNumbers as (
select MinID,MAX(MaxID) as MaxID,ROW_NUMBER() OVER (ORDER BY MinID) as PageNo
from PageRows
group by MinID
)
select * from PageNumbers
Related Topics
Address Standardization Within a Database
Calendar Table - Week Number of Month
Splitting Delimited Values in a SQL Column into Multiple Rows
Oracle - Ora-01489: Result of String Concatenation Is Too Long
Postgres Window Function and Group by Exception
Calculate Working Hours Between 2 Dates in Postgresql
Difference Between Subquery and Correlated Subquery
SQL Server Converting Varbinary to String
Eliminating Duplicate Values Based on Only One Column of the Table
Paging SQL Server 2005 Results
Convert Date Format into Dd/Mmm/Yyyy Format in SQL Server
How to Perform a Group by on an Aliased Column in SQL Server
MySQL - How Many Columns Is Too Many
SQL Query to Find Nth Highest Salary from a Salary Table
Explain Join VS. Left Join and Where Condition Performance Suggestion in More Detail
How to Get Second Largest or Third Largest Entry from a Table