Is There a Performance Difference in Using a Group by with Max() as the Aggregate VS Row_Number Over Partition By

Is there a performance difference in using a GROUP BY with MAX() as the aggregate vs ROW_NUMBER over partition by?

The group by should be faster. The row number has to assign a row to all rows in the table. It does this before filtering out the ones it doesn't want.

The second query is, by far, the better construct. In the first, you have to be sure that the columns in the partition clause match the columns that you want. More importantly, "group by" is a well-understood construct in SQL. I would also speculate that the group by might make better use of indexes, but that is speculation.

SQL: difference between PARTITION BY and GROUP BY

They're used in different places. GROUP BY modifies the entire query, like:

select customerId, count(*) as orderCount
from Orders
group by customerId

But PARTITION BY just works on a window function, like ROW_NUMBER():

select row_number() over (partition by customerId order by orderId)
as OrderNumberForThisCustomer
from Orders
  • GROUP BY normally reduces the number of rows returned by rolling
    them up and calculating averages or sums for each row.
  • PARTITION BY does not affect the number of rows returned, but it
    changes how a window function's result is calculated.

Get top 1 row of each group

;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1

If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead

As for normalised or not, it depends if you want to:

  • maintain status in 2 places
  • preserve status history
  • ...

As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.

Comparison Group by VS Over Partition By

Yes It may affects

Second query is an example of Inline View.
It's a very useful method for performing reports with various types of counts or use of any aggregate functions with it.

Oracle executes the subquery and then uses the resulting rows as a view in the FROM clause.

As we consider about performance , always recommend inline view instead of choosing another subquery type.

And one more thing second query will give all max records,while first one will give you only one max record.

see here

Row_number over partition and find the max rn value

Use max window function.

SELECT T.*,MAX(rn) OVER(PARTITION BY OrderNo) AS rn_max
FROM (
select OrderNO,PartCode,Quantity,row_number() over(partition by OrderNO order by DateEntered desc) as rn
from YourTable
) T

Edit: An easier option is to use count as suggested by @Jason A. Long in the comments.

select OrderNO
,PartCode
,Quantity
,row_number() over(partition by OrderNO order by DateEntered desc) as rn
,count(*) over(partition by OrderNO) as maxrn
from YourTable

Need Help for SQL MIN MAX Group By and AGGREGATE

This is a gaps-and-island problem, where you want to group together "adjacent" rows having the same holeid and alteration.

Here is on approach using window functions: the difference between row numbers can be used to define the groups.

select
max(id) max_id,
min([from]) min_from,
max([to]) max_to,
alteration
from (
select
a.*,
row_number() over(partition by holeid order by [from]) rn1,
row_number() over(partition by holeid, alteration order by [from]) rn2
from dbo.alt a
) t
group by holeid, alteration, rn1 - rn2
order by min_from

Demo on DB Fiddle:


min_from | max_to | alteration
:------- | :----- | :---------
0.00 | 132.60 | AA-LT-1
132.60 | 171.28 | ARG-1-MSI
171.28 | 216.80 | AA-LT-1
216.80 | 232.60 | ARG-2-Kaol
232.60 | 256.90 | ARG-1-MSI
256.90 | 265.70 | ARG-2-Kaol
265.70 | 290.10 | ARG-1-MSI
290.10 | 294.85 | ARG-2-Kaol
294.85 | 325.00 | ARG-1-MSI
325.00 | 332.10 | ARG-2-Kaol
332.10 | 382.70 | ARG-1-MSI
382.70 | 396.10 | ARG-2-Kaol
396.10 | 416.20 | ARG-1-MSI

Note: your sample data has no column id so this does not appear in the above results.

How to parse first_value aggregate in a group by statement [SNOWFLAKE] SQL

First_value is not an aggregate function. But an window function, thus you get an error when you use it in relation to a GROUP BY. If you want to use it with a group up put an ANY_VALUE around it.

here is some data I will use below in a CTE:

with data(id, seq, val) as (
select * from values
(1, 1, 10),
(1, 2, 11),
(1, 3, 12),
(1, 4, 13),
(2, 1, 20),
(2, 2, 21),
(2, 3, 22)
)

So to show FIRST_VALUE is a window function we can just use it

select *
,first_value(val)over(partition by id order by seq) as first_val
from data
IDSEQVALFIRST_VAL
111010
121110
131210
141310
212020
222120
232220

Get records with max value for each group of grouped SQL results

There's a super-simple way to do this in mysql:

select * 
from (select * from mytable order by `Group`, age desc, Person) x
group by `Group`

This works because in mysql you're allowed to not aggregate non-group-by columns, in which case mysql just returns the first row. The solution is to first order the data such that for each group the row you want is first, then group by the columns you want the value for.

You avoid complicated subqueries that try to find the max() etc, and also the problems of returning multiple rows when there are more than one with the same maximum value (as the other answers would do)

Note: This is a mysql-only solution. All other databases I know will throw an SQL syntax error with the message "non aggregated columns are not listed in the group by clause" or similar. Because this solution uses undocumented behavior, the more cautious may want to include a test to assert that it remains working should a future version of MySQL change this behavior.

Version 5.7 update:

Since version 5.7, the sql-mode setting includes ONLY_FULL_GROUP_BY by default, so to make this work you must not have this option (edit the option file for the server to remove this setting).

Is there any difference between GROUP BY and DISTINCT

MusiGenesis' response is functionally the correct one with regard to your question as stated; the SQL Server is smart enough to realize that if you are using "Group By" and not using any aggregate functions, then what you actually mean is "Distinct" - and therefore it generates an execution plan as if you'd simply used "Distinct."

However, I think it's important to note Hank's response as well - cavalier treatment of "Group By" and "Distinct" could lead to some pernicious gotchas down the line if you're not careful. It's not entirely correct to say that this is "not a question about aggregates" because you're asking about the functional difference between two SQL query keywords, one of which is meant to be used with aggregates and one of which is not.

A hammer can work to drive in a screw sometimes, but if you've got a screwdriver handy, why bother?

(for the purposes of this analogy, Hammer : Screwdriver :: GroupBy : Distinct and screw => get list of unique values in a table column)



Related Topics



Leave a reply



Submit