Best-Performance Query for "Select Max in Group"

Improving speed of SQL query with MAX, WHERE, and GROUP BY on three different columns

There is a great Stack Overflow post on optimization of Selecting rows with the max value in a column: https://stackoverflow.com/a/7745635/633063

This seems a little messy but works very well:

SELECT example1.name, MAX(example1.id)
FROM exampletable example1
INNER JOIN (
select name, max(dateAdded) dateAdded
from exampletable
where dateAdded <= '2014-01-20 12:00:00'
group by name
) maxDateByElement on example1.name = maxDateByElement.name AND example1.dateAdded = maxDateByElement.dateAdded
GROUP BY name;

Get records with max value for each group of grouped SQL results

There's a super-simple way to do this in mysql:

select * 
from (select * from mytable order by `Group`, age desc, Person) x
group by `Group`

This works because in mysql you're allowed to not aggregate non-group-by columns, in which case mysql just returns the first row. The solution is to first order the data such that for each group the row you want is first, then group by the columns you want the value for.

You avoid complicated subqueries that try to find the max() etc, and also the problems of returning multiple rows when there are more than one with the same maximum value (as the other answers would do)

Note: This is a mysql-only solution. All other databases I know will throw an SQL syntax error with the message "non aggregated columns are not listed in the group by clause" or similar. Because this solution uses undocumented behavior, the more cautious may want to include a test to assert that it remains working should a future version of MySQL change this behavior.

Version 5.7 update:

Since version 5.7, the sql-mode setting includes ONLY_FULL_GROUP_BY by default, so to make this work you must not have this option (edit the option file for the server to remove this setting).

MAX vs Top 1 - which is better?

Performance is generally similar, if your table is indexed.

Worth considering though: Top usually only makes sense if you're ordering your results (otherwise, top of what?)

Ordering a result requires more processing.

Min doesn't always require ordering. (Just depends, but often you don't need order by or group by, etc.)

In your two examples, I'd expect speed / x-plan to be very similar. You can always turn to your stats to make sure, but I doubt the difference would be significant.

MySQL Query Optimization Group By with Max

The query looks fine. All you can do is provide an appropriate index. That would be an index on the columns in the WHERE clause at least. Start with the most restrictive column. So,

  • how many rows match active = 1?
  • how many rows match deletedOn IS NULL?
  • how many rows match updatedOn <= timestamp '2019-03-25 21:00:00'?

Pick the one that gets the least number of rows. Say it's active, then updatedOn, then deletedOn. This gives you:

create index idx on audit_frame_master(active, updatedOn, deletedOn);

As you want to group by frame_id then and then find the maximum id, you can add those in this order:

create index idx on audit_frame_master(active, updatedOn, deletedOn, frame_id, id);

This is a covering index. If the DBMS uses it, it doesn't even have to access the table.

The DBMS may or may not use this index. It's just an offer. If the DBMS thinks it will be too much work to go through an index rather then simply read the table sequentially, then it won't use it. Just try.

Optimizing a very slow select max group by query on Sybase ASE 15.5

So finally the nonclustered index on (id, version desc) did the trick without having to change anything to the query. Index creation also takes one hour and the query responds in few seconds. But I guess it's still better than having another table that could cause data integrity issues.

MAX, GROUP BY query taking a long time across large table

From your query, I'm guessing VisitMovement does not have EndDate, so the join is just to use the EndDate from Visit table. IF so, why you don't just join ID and EndDate from Visit table, rather than joining the full table ?

So, you can do this :

SELECT 
MAX(VM.VisitMovementID) as VisitMovementID
FROM
VisitMovement VM
INNER JOIN
(SELECT VisitID, EndDate FROM Visit WHERE EndDate > @RecentlyLeftDate) V ON V.VisitID = VM.VisitID
WHERE
V.EndDate > @RecentlyLeftDate
GROUP BY
V.VisitID

Adding WHERE EndDate > @RecentlyLeftDate inside the INNER JOIN will reduce the retrieved records from Visit table, so it'll retrieve only the records which fit that timeline and not retrieving the 1,347,957 records!

you may also adjust your indexes and make sure you add the identity columns under Index key columns (make sure to put the right Sort Order for each column) and in the Included columns, add the columns that are frequently used.

Alternative Method :
This is another approach that got in mind, you need to check and give it a try

SELECT 
MAX(VM.VisitMovementID) as VisitMovementID
FROM
VisitMovement VM
WHERE
VisitID IN (SELECT VisitID FROM Visit WHERE EndDate > @RecentlyLeftDate)
GROUP BY
V.VisitID


Related Topics



Leave a reply



Submit