Get Top Row(S) with Highest Value, with Ties

Get top row(s) with highest value, with ties

The first query fails if any row has quantity IS NULL (as Gordon demonstrates).

The second query only fails if all rows have quantity IS NULL. So it should be usable in most cases. (And it's faster.)

Postgres 13 or newer

Use the standard SQL clause WITH TIES:

SELECT id
FROM product
ORDER BY quantity DESC NULLS LAST
FETCH FIRST 1 ROWS WITH TIES;

db<>fiddle here

Works with any amount of NULL values.

The manual:

SQL:2008 introduced a different syntax to achieve the same result,
which PostgreSQL also supports. It is:

OFFSET start { ROW | ROWS }
FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES }

In this syntax, the start or count value is required by the standard
to be a literal constant, a parameter, or a variable name; as a
PostgreSQL extension, other expressions are allowed, but will
generally need to be enclosed in parentheses to avoid ambiguity. If
count is omitted in a FETCH clause, it defaults to 1. The WITH TIES
option is used to return any additional rows that tie for the last
place in the result set according to the ORDER BY clause; ORDER BY is
mandatory in this case. ROW and ROWS as well as FIRST and NEXT are
noise words that don't influence the effects of these clauses.

Notably, WITH TIES cannot be used with the (non-standard) short syntax LIMIT n.

It's the fastest possible solution. Faster than either of your current queries. More important for performance: have an index on (quantity). Or a more specialized covering index to allow index-only scans (a bit faster, yet):

CREATE INDEX ON product (quantity DESC NULLS LAST) INCLUDE (id);

See:

  • Do covering indexes in PostgreSQL help JOIN columns?

We need NULLS LAST to keep NULL values last in descending order. See:

  • Sort by column ASC, but NULL values first?

Postgres 12 or older

A NULL-safe query:

SELECT id, quantity
FROM product
WHERE quantity IS NOT DISTINCT FROM (SELECT MAX(quantity) FROM product);

Or, probably faster:

SELECT id, quantity
FROM (
SELECT *, rank() OVER (ORDER BY quantity DESC NULLS LAST) AS rnk
FROM product
) sub
WHERE rnk = 1;

See:

  • PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?

Faster alternatives for big tables:

  • Equivalent for FETCH FIRST WITH TIES in PostgreSQL 11 with comparable performance

Python pandas: Retrieve highest value of each rows with TIES

Calculate the max values per Row Number using groupby.transform('max') and then filter:

df[df.Rank == df.Rank.groupby(df['Row Number']).transform('max')]

Row Number Rank
0 702 20
1 702 20
3 100 5
4 100 5

SQL query with Highest value with tie

Use RANK() window function:

SELECT t.dept_name, t.name, t.tot_cred
FROM (
SELECT dept_name, name, tot_cred,
RANK() OVER(PARTITION BY dept_name ORDER BY tot_cred DESC) rn
FROM university.student
) t
WHERE t.rn = 1

This is an alternative in case you can't use window functions:

SELECT s.dept_name, s.name, s.tot_cred
FROM university.student s
INNER JOIN (
SELECT dept_name, MAX(tot_cred) tot_cred
FROM university.student
GROUP BY dept_name
) t ON t.dept_name = s.dept_name AND s.tot_cred = t.tot_cred

or with NOT EXISTS:

SELECT s.dept_name, s.name, s.tot_cred 
FROM university.student s
WHERE NOT EXISTS (
SELECT 1 FROM university.student
WHERE dept_name = s.dept_name AND tot_cred > s.tot_cred
)

Get records with max value for each group of grouped SQL results

There's a super-simple way to do this in mysql:

select * 
from (select * from mytable order by `Group`, age desc, Person) x
group by `Group`

This works because in mysql you're allowed to not aggregate non-group-by columns, in which case mysql just returns the first row. The solution is to first order the data such that for each group the row you want is first, then group by the columns you want the value for.

You avoid complicated subqueries that try to find the max() etc, and also the problems of returning multiple rows when there are more than one with the same maximum value (as the other answers would do)

Note: This is a mysql-only solution. All other databases I know will throw an SQL syntax error with the message "non aggregated columns are not listed in the group by clause" or similar. Because this solution uses undocumented behavior, the more cautious may want to include a test to assert that it remains working should a future version of MySQL change this behavior.

Version 5.7 update:

Since version 5.7, the sql-mode setting includes ONLY_FULL_GROUP_BY by default, so to make this work you must not have this option (edit the option file for the server to remove this setting).

SQL select only rows with max value on a column


At first glance...

All you need is a GROUP BY clause with the MAX aggregate function:

SELECT id, MAX(rev)
FROM YourTable
GROUP BY id

It's never that simple, is it?

I just noticed you need the content column as well.

This is a very common question in SQL: find the whole data for the row with some max value in a column per some group identifier. I heard that a lot during my career. Actually, it was one the questions I answered in my current job's technical interview.

It is, actually, so common that Stack Overflow community has created a single tag just to deal with questions like that: greatest-n-per-group.

Basically, you have two approaches to solve that problem:

Joining with simple group-identifier, max-value-in-group Sub-query

In this approach, you first find the group-identifier, max-value-in-group (already solved above) in a sub-query. Then you join your table to the sub-query with equality on both group-identifier and max-value-in-group:

SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
SELECT id, MAX(rev) rev
FROM YourTable
GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev

Left Joining with self, tweaking join conditions and filters

In this approach, you left join the table with itself. Equality goes in the group-identifier. Then, 2 smart moves:

  1. The second join condition is having left side value less than right value
  2. When you do step 1, the row(s) that actually have the max value will have NULL in the right side (it's a LEFT JOIN, remember?). Then, we filter the joined result, showing only the rows where the right side is NULL.

So you end up with:

SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;

Conclusion

Both approaches bring the exact same result.

If you have two rows with max-value-in-group for group-identifier, both rows will be in the result in both approaches.

Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its "flavor".

Both approaches are also performance friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make most of sense to you.

How can I SELECT the first row with MAX(Column value)?

Why does your second query not work...

select   Item_No,
Quantity
from Rec_details
group by Item_No,
Quantity
having Quantity=max(Quantity);

You are grouping by both Item_No and Quantity and the Item_No appears to be the primary key and contain unique values so each group will only contain one row. The HAVING clause looks within the group so it will check that the value of quantity is the maximum value within that group but there is only one value within the group so this will always be true. Your query is the equivalent of:

SELECT DISTINCT
Item_No,
Quantity
FROM Rec_details;

Some other ways to get the maximum value:

SQL Fiddle

Oracle 11g R2 Schema Setup:

create table Rec_details (item_no, Quantity ) AS
SELECT 12507,1 FROM DUAL UNION ALL
SELECT 12549,4 FROM DUAL UNION ALL
SELECT 12100,8 FROM DUAL UNION ALL
SELECT 12501,2 FROM DUAL UNION ALL
SELECT 12201,7 FROM DUAL UNION ALL
SELECT 12509,3 FROM DUAL UNION ALL
SELECT 12080,1 FROM DUAL;

Query 1 - Get one row with maximum quantity and latest item_no (using 1 table scan):

SELECT MAX( item_no ) KEEP ( DENSE_RANK LAST ORDER BY Quantity ) AS Item_no,
MAX( Quantity ) AS Quantity
FROM Rec_Details

Results:

| ITEM_NO | QUANTITY |
|---------|----------|
| 12100 | 8 |

Query 2 - Get one row with maximum quantity and latest item_no (using 1 table scan):

SELECT *
FROM (
SELECT *
FROM Rec_details
ORDER BY Quantity DESC, Item_no DESC
)
WHERE ROWNUM = 1

Results:

| ITEM_NO | QUANTITY |
|---------|----------|
| 12100 | 8 |

Query 3 - Get all rows with maximum quantity (using 1 table scan):

SELECT Item_no, Quantity
FROM (
SELECT r.*,
RANK() OVER ( ORDER BY Quantity DESC ) AS rnk
FROM Rec_details r
)
WHERE rnk = 1

Results:

| ITEM_NO | QUANTITY |
|---------|----------|
| 12100 | 8 |

Query 4 - Get all rows with maximum quantity (using 2 table scans):

SELECT Item_no,
Quantity
FROM Rec_Details
WHERE Quantity = ( SELECT MAX( Quantity ) FROM Rec_Details )

Results:

| ITEM_NO | QUANTITY |
|---------|----------|
| 12100 | 8 |

Query 5 - Get one row with maximum Quantity and latest Item_No using Oracle 12 Syntax (1 table scan):

SELECT *
FROM Rec_Details
ORDER BY Quantity DESC, Item_No DESC
FETCH FIRST ROW ONLY;

Query 5 - Get all rows with maximum Quantity using Oracle 12 Syntax (1 table scan):

SELECT *
FROM Rec_Details
ORDER BY Quantity DESC
FETCH FIRST ROW WITH TIES;

R - find out if a the column containing the max value of a row is tied with another column

An option would be to get the max and check the count of max elements by row

df$tie <- apply(df[1:5], 1, function(x) sum(x == max(x)) > 1)
df$tie
#[1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Or using rowSums

rowSums(df[1:5] == do.call(pmax, df[1:5])) > 1
#[1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

NOTE: Both the methods are generalized and can take care of multiple conditions

How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?

You are so close! All you need to do is select BOTH the home and its max date time, then join back to the topten table on BOTH fields:

SELECT tt.*
FROM topten tt
INNER JOIN
(SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt
ON tt.home = groupedtt.home
AND tt.datetime = groupedtt.MaxDateTime

Select the row with the maximum value in each group

Here's a data.table solution:

require(data.table) ## 1.9.2
group <- as.data.table(group)

If you want to keep all the entries corresponding to max values of pt within each group:

group[group[, .I[pt == max(pt)], by=Subject]$V1]
# Subject pt Event
# 1: 1 5 2
# 2: 2 17 2
# 3: 3 5 2

If you'd like just the first max value of pt:

group[group[, .I[which.max(pt)], by=Subject]$V1]
# Subject pt Event
# 1: 1 5 2
# 2: 2 17 2
# 3: 3 5 2

In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.



Related Topics



Leave a reply



Submit