SQL Performance: Where VS Where(Row_Number)

SQL performance: WHERE vs WHERE(ROW_NUMBER)

The 2nd answer is your best choice. It takes into account the fact that you could have holes in your ID column. I'd rewrite it as a CTE though instead of a subquery...

;WITH MyCTE AS
(SELECT *,
ROW_NUMBER() OVER (ORDER BY ID) AS row
FROM Table)
SELECT *
FROM MyCTE
WHERE row >= @start
AND row <= @end

Is there a performance difference in using a GROUP BY with MAX() as the aggregate vs ROW_NUMBER over partition by?

The group by should be faster. The row number has to assign a row to all rows in the table. It does this before filtering out the ones it doesn't want.

The second query is, by far, the better construct. In the first, you have to be sure that the columns in the partition clause match the columns that you want. More importantly, "group by" is a well-understood construct in SQL. I would also speculate that the group by might make better use of indexes, but that is speculation.

ROW_NUMBER() Performance optimization

You could try creating an Indexed View on the two tables:

CREATE VIEW dbo.YourIndexedView
WITH SCHEMABINDING
AS
SELECT az.ArticleID,
az.ChannnelID,
az.ZoneID,
a.LastEditDate,
a.LastEditDateTime,
az.ArticleOrder
FROM dbo.Article_tbl a
INNER JOIN dbo.ArticleZone_tbl az
ON a.ArticleID = az.AtricleID;

GO
CREATE UNIQUE CLUSTERED INDEX UQ_YourIndexView_ArticleID_ChannelID_ZoneID
ON dbo.YourIndexedView (ArticleID, ChannelID, ZoneID);

Once you have your clustered index in place you can create a nonclustered index that would assist in the sorting:

CREATE NONCLUSTERED INDEX IX_YourIndexedView_LastEditDate_ArticleOrder_LastEditDateTime
ON dbo.YourIndexedView (LastEditDate DESC, ArticleOrder ASC, LastEditDateTime DESC);

You can then reference this in your query:

WITH OrderedOrders AS
( SELECT RowNum = ROW_NUMBER() OVER(ORDER BY LastEditDate DESC, ArticleOrder ASC, LastEditDateTime DESC),
ArticleID,
ChannelID,
ZoneID,
LastEditDateTime,
ArticleOrder
FROM dbo.YourIndexedView WITH (NOEXPAND)
WHERE ChannelID = 1
AND ZoneID = 0
)
SELECT *
FROM OrderedOrders
WHERE RowNum BETWEEN 1 AND 10;

N.B. I may have missed some columns from your article table, but I couldn't infer them from the question

Furthermore, if your query is always going to have the same zone and channel, you could filter the view, then your clustered index column simply becomes ArticleID:

CREATE VIEW dbo.YourIndexedView
WITH SCHEMABINDING
AS
SELECT az.ArticleID,
az.ChannnelID,
az.ZoneID,
a.LastEditDate,
a.LastEditDateTime,
az.ArticleOrder
FROM Article_tbl a
INNER JOIN ArticleZone_tbl az
ON a.ArticleID = az.AtricleID
WHERE az.ChannelID = 1
AND Az.ZoneID = 1;

GO
CREATE UNIQUE CLUSTERED INDEX UQ_YourIndexView_ArticleID
ON dbo.YourIndexedView (ArticleID);

Which means your indexes will be smaller, and faster to use.

Improve the performance with cte ROW_NUMBER()

One core principle when doing SQL queries with mutliple steps needs to be: eliminate as much data as possible as early as possible. Your CTE loads all rows from TransactionSendQueue, when you only want the latest transaction per Tran_ID. The more data that's being handled, the higher risk of data loaded being written to disk, which is extremely detrimental to performance. The more data that's written to disk, the worse the impact is. You can view your execution plan to check if this is the case but I'd say it's likely considering the execution time.

The CTE should only return one row per row that could possibly be updated in your #TempTran table. You can use an additional CTE to retrieve the latest update first, and then use that information in your ctetran to reduce the amount of data (rows) being search ed through in the update statement.

WITH LatestTran AS --the lastest transaction
(
SELECT
Tran_ID,
MAX(LastUpdate) AS LastUpdate
FROM
TransactionSendQueue
WHERE
STATUS = '1' --where 1 mean complete
GROUP BY
Tran_ID
), ctetran AS
(
SELECT
Tran_ID,
Field2,
Field3,
Field4,
Field5,
Field6,
Field7,
Field8,
Field9
FROM
TransactionSendQueue TSQ
INNER JOIN LatestTran LT ON
TSQ.Tran_ID = LT.Tran_ID AND
TSQ.LastUpdate = LT.LastUpdate
)


UPDATE temp
SET STATUS = CASE
WHEN temp.f2 = cte.Field2
AND temp.f3 = cte.Field3
AND temp.f4 = cte.Field4
AND temp.f5 = cte.Field5
AND temp.f6 = cte.Field6
AND temp.f7 = cte.Field7
AND temp.f8 = cte.Field8
AND temp.f9 = cte.Field9
THEN '2' -- where 2 mean skip
ELSE '3' --where 3 mean ready to execute
END
FROM #TempTran temp
INNER JOIN ctetran cte ON temp.Tran_ID = cte.Tran_ID

How big performance increase this will be is dependent on how many Batch_ID you have per Tran_ID, the more the bigger perfomance boost.

If the query is still running slow, you could also look into using an index for the LastUpdate column in the TransactionSendQueue table, since the query is now using that in a join statement.

Please let me know how much the query time is reduced, would be interesting to know.

ROW_NUMBER Performance

Your problem might be parameter sniffing.
SQL doesn't know what @Skip and @Take are going to be and this could be throwing the plan.

Its worth comparing the data in Prod & Dev - Often Dev data can be a lot more "skewy" especially if its being inserted manually.
Look at see what are the Avg, min and Max

Try sticking a with RECOMPILE on the stored proc and see if this helps.
Alternately - DBCC FREEPROCCACHE will empty the proc cache and create a new plan.

I just realized you are not partitioning your Row_Number() query - this means that in effect you are just running a TOP N query order by Fullname- seems strange as it will give random results based on the number of Extracts and employees

What are the differences between the older row_number() and the newer OFFSET + FETCH based pagination in SQL Server?

Using ROW_NUMBER() works fine - it's just more work than necessary; you need to write a "skeleton" CTE around your actual query, add the ROW_NUMBER() column to your output set, and then filter on that.

Using the new OFFSET / FETCH is simpler - and yes, it's also better for performance, as these two links can show you:

  • New T-SQL features in SQL Server 2012
  • Comparing performance for different SQL Server paging

So overall: if you're using SQL Server 2012 - then you should definitely use OFFSET/FETCH rather than ROW_NUMBER() for paging



Related Topics



Leave a reply



Submit