Sql: Select Maximum Value for Each Unique Key

SQL: Select maximum value for each unique key?

Does SQL Server compact support windowed functions?

Alternative 1--Will include all rows that tie. Will not include a row, if the only rows for a given Thread all have null for HitCount:

SELECT Thread, Function, HitCount
FROM (SELECT Thread, Function, HitCount,
MAX(HitCount) over (PARTITION BY Thread) as MaxHitCount
FROM Samples
WHERE FunctionId NOT IN
(SELECT CalleeId FROM Callers)) t
WHERE HitCount = MaxHitCount
ORDER BY ThreadId, HitCount DESC

Alternative 2--Will include all rows that tie. If there is no row for a given thread with non-null HitCount, will return all rows for that thread:

SELECT Thread, Function, HitCount
FROM (SELECT Thread, Function, HitCount,
RANK() over (PARTITION BY Thread ORDER BY HitCount DESC) as R
FROM Samples
WHERE FunctionId NOT IN
(SELECT CalleeId FROM Callers)) t
WHERE R = 1
ORDER BY ThreadId, HitCount DESC

Alternative 3--Will non-determistically pick one row in case of ties and discard others. Will include a row if all rows for a given thread have null HitCount

SELECT Thread, Function, HitCount
FROM (SELECT Thread, Function, HitCount,
ROW_NUMBER() over (PARTITION BY Thread ORDER BY HitCount DESC) as R
FROM Samples
WHERE FunctionId NOT IN
(SELECT CalleeId FROM Callers)) t
WHERE R = 1
ORDER BY ThreadId, HitCount DESC

Alternative 4 & 5--Uses older constructs, if the windowed functions aren't available, and says what is meant a little cleaner than using joins. Benchmark if spead is a priority. Both return all rows that participate in a tie. Alternative 4 will HitCount is null when non-null values are not available for HitCount. Alternative 5 will not return rows with HitCount is null.

SELECT *
FROM Samples s1
WHERE FunctionId NOT IN
(SELECT CalleeId FROM Callers)
AND NOT EXISTS
(SELECT *
FROM Samples s2
WHERE s1.FunctionId = s2.FunctionId
AND s1.HitCount < s2.HitCount)
ORDER BY ThreadId, HitCount DESC

SELECT *
FROM Samples s1
WHERE FunctionId NOT IN
(SELECT CalleeId FROM Callers)
AND HitCount =
(SELECT MAX(HitCount)
FROM Samples s2
WHERE s1.FunctionId = s2.FunctionId)
ORDER BY ThreadId, HitCount DESC

Get a max record for each unique column value in a table

Using MAX() as a window function:

SELECT t.A, t.B, t.C
FROM
(
SELECT A, B, C, MAX(C) OVER (PARTITION BY A) max_C
FROM yourTable
) t
WHERE t.C = t.max_C

If you want to retrieve only a single max record for each group of A values, then you should use the method suggested by @GurV, which is the row number:

SELECT t.A, t.B, t.C
FROM
(
SELECT A, B, C, ROW_NUMBER() OVER (PARTITION BY A ORDER BY C, B DESC) row_num
FROM yourTable
) t
WHERE t.row_num = 1

Note carefully the ORDER BY C, B inside the call to ROW_NUMBER(). This will place max C records at the top of each partition, and will then also order descending by B values. Only one value will be retained though.

select max value for each category with same unique key

You first need to aggregate to get the sum per uniquekey, city, test2 and test3 combination.

Then, to get the ones with the highest sum per city you could filter for the row_number() window function partitioning by city and ordering by the sum descending being 1.

SELECT city,
test2,
test3,
value1
FROM (SELECT city,
test2,
test3,
sum(value1) value1,
row_number() OVER (PARTITION BY city
ORDER BY sum(value1) DESC) rn
FROM nypd
WHERE city IN ('NYC', 'LAX')
GROUP BY uniquekey,
city,
test2,
test3) x
WHERE rn = 1;

However older versions prior 3.25.0 of SQLite don't support row_number(). Here you can use EXISTS and a correlated subquery checking for the existence of sums that are greater than the current sum or, in case of a tie, check for the uniquekey of the other row to be greater. The aggregation can be put in a CTE so it doesn't need to be repeated in the subquery.

WITH cte
AS
(
SELECT uniquekey,
city,
test2,
test3,
sum(value1) value1
FROM nypd
WHERE city IN ('NYC', 'LAX')
GROUP BY uniquekey,
city,
test2,
test3
)
SELECT c1.city,
c1.test2,
c1.test3,
c1.value1
FROM cte c1
WHERE NOT EXISTS (SELECT *
FROM cte c2
WHERE c2.city = c1.city
AND (c2.value1 > c1.value1
OR c2.value1 = c1.value1
AND c2.uniquekey > c1.uniquekey));

Get the max column value for each unique ID

The problem is that there are ties.

For a given customer, some place more than one order per day. So there's a possibility that occasionally some may have placed more than one order on the date that is their max date.

To fix this, you need to use MAX() or some column that is always unique in the Orders table (or at least unique within a given date). This is easy if you can depend on an auto-increment primary key in the Orders table:

SELECT *
FROM customers
INNER JOIN
(
SELECT CustomerID, max(orderid) as orderid as date
FROM orders
GROUP BY CustomerID
) Sub1
ON customers.id = Sub1.CustomerID
INNER JOIN orders
ON orders.CustomerID = Sub1.CustomerID
AND orders.orderid = Sub1.orderid

This assumes that orderid increases in lock-step with increasing dates. That is, you'll never have an order with a greater auto-inc id but an earlier date. That might happen if you allow data to be entered out of chronological order, e.g. back-dating orders.

How to get MAX() value and Record Primary Key where the value occurs in a Database?

So I found this solution. This would give the Primary Key or the record ID. It also allows for the possibility that there is more than one record with the MAX() or MIN() value.

SELECT ID, Name, Hours AS [Max] FROM Professions
WHERE Hours = (SELECT MAX(Hours) FROM Professions)
ORDER BY ID ASC

Here is proof using another database, though the same concept applies.
Sample Image

I actually wrote this code while working on a completely different problem.

How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?

You are so close! All you need to do is select BOTH the home and its max date time, then join back to the topten table on BOTH fields:

SELECT tt.*
FROM topten tt
INNER JOIN
(SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt
ON tt.home = groupedtt.home
AND tt.datetime = groupedtt.MaxDateTime

SQL select only rows with max value on a column

At first glance...

All you need is a GROUP BY clause with the MAX aggregate function:

SELECT id, MAX(rev)
FROM YourTable
GROUP BY id

It's never that simple, is it?

I just noticed you need the content column as well.

This is a very common question in SQL: find the whole data for the row with some max value in a column per some group identifier. I heard that a lot during my career. Actually, it was one the questions I answered in my current job's technical interview.

It is, actually, so common that Stack Overflow community has created a single tag just to deal with questions like that: greatest-n-per-group.

Basically, you have two approaches to solve that problem:

Joining with simple group-identifier, max-value-in-group Sub-query

In this approach, you first find the group-identifier, max-value-in-group (already solved above) in a sub-query. Then you join your table to the sub-query with equality on both group-identifier and max-value-in-group:

SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
SELECT id, MAX(rev) rev
FROM YourTable
GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev

Left Joining with self, tweaking join conditions and filters

In this approach, you left join the table with itself. Equality goes in the group-identifier. Then, 2 smart moves:

  1. The second join condition is having left side value less than right value
  2. When you do step 1, the row(s) that actually have the max value will have NULL in the right side (it's a LEFT JOIN, remember?). Then, we filter the joined result, showing only the rows where the right side is NULL.

So you end up with:

SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;

Conclusion

Both approaches bring the exact same result.

If you have two rows with max-value-in-group for group-identifier, both rows will be in the result in both approaches.

Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its "flavor".

Both approaches are also performance friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make most of sense to you.



Related Topics



Leave a reply



Submit