Improving speed of SQL query with MAX, WHERE, and GROUP BY on three different columns
There is a great Stack Overflow post on optimization of Selecting rows with the max value in a column: https://stackoverflow.com/a/7745635/633063
This seems a little messy but works very well:
SELECT example1.name, MAX(example1.id)
FROM exampletable example1
INNER JOIN (
select name, max(dateAdded) dateAdded
from exampletable
where dateAdded <= '2014-01-20 12:00:00'
group by name
) maxDateByElement on example1.name = maxDateByElement.name AND example1.dateAdded = maxDateByElement.dateAdded
GROUP BY name;
Get records with max value for each group of grouped SQL results
There's a super-simple way to do this in mysql:
select *
from (select * from mytable order by `Group`, age desc, Person) x
group by `Group`
This works because in mysql you're allowed to not aggregate non-group-by columns, in which case mysql just returns the first row. The solution is to first order the data such that for each group the row you want is first, then group by the columns you want the value for.
You avoid complicated subqueries that try to find the max()
etc, and also the problems of returning multiple rows when there are more than one with the same maximum value (as the other answers would do)
Note: This is a mysql-only solution. All other databases I know will throw an SQL syntax error with the message "non aggregated columns are not listed in the group by clause" or similar. Because this solution uses undocumented behavior, the more cautious may want to include a test to assert that it remains working should a future version of MySQL change this behavior.
Version 5.7 update:
Since version 5.7, the sql-mode
setting includes ONLY_FULL_GROUP_BY
by default, so to make this work you must not have this option (edit the option file for the server to remove this setting).
MAX vs Top 1 - which is better?
Performance is generally similar, if your table is indexed.
Worth considering though: Top
usually only makes sense if you're ordering your results (otherwise, top
of what?)
Ordering a result requires more processing.
Min doesn't always require ordering. (Just depends, but often you don't need order by or group by, etc.)
In your two examples, I'd expect speed / x-plan to be very similar. You can always turn to your stats to make sure, but I doubt the difference would be significant.
MySQL Query Optimization Group By with Max
The query looks fine. All you can do is provide an appropriate index. That would be an index on the columns in the WHERE
clause at least. Start with the most restrictive column. So,
- how many rows match
active = 1
? - how many rows match
deletedOn IS NULL
? - how many rows match
updatedOn <= timestamp '2019-03-25 21:00:00'
?
Pick the one that gets the least number of rows. Say it's active
, then updatedOn
, then deletedOn
. This gives you:
create index idx on audit_frame_master(active, updatedOn, deletedOn);
As you want to group by frame_id
then and then find the maximum id
, you can add those in this order:
create index idx on audit_frame_master(active, updatedOn, deletedOn, frame_id, id);
This is a covering index. If the DBMS uses it, it doesn't even have to access the table.
The DBMS may or may not use this index. It's just an offer. If the DBMS thinks it will be too much work to go through an index rather then simply read the table sequentially, then it won't use it. Just try.
Optimizing a very slow select max group by query on Sybase ASE 15.5
So finally the nonclustered index on (id, version desc) did the trick without having to change anything to the query. Index creation also takes one hour and the query responds in few seconds. But I guess it's still better than having another table that could cause data integrity issues.
MAX, GROUP BY query taking a long time across large table
From your query, I'm guessing VisitMovement does not have EndDate, so the join is just to use the EndDate from Visit table. IF so, why you don't just join ID and EndDate from Visit table, rather than joining the full table ?
So, you can do this :
SELECT
MAX(VM.VisitMovementID) as VisitMovementID
FROM
VisitMovement VM
INNER JOIN
(SELECT VisitID, EndDate FROM Visit WHERE EndDate > @RecentlyLeftDate) V ON V.VisitID = VM.VisitID
WHERE
V.EndDate > @RecentlyLeftDate
GROUP BY
V.VisitID
Adding WHERE EndDate > @RecentlyLeftDate
inside the INNER JOIN will reduce the retrieved records from Visit table, so it'll retrieve only the records which fit that timeline and not retrieving the 1,347,957 records!
you may also adjust your indexes and make sure you add the identity columns under Index key columns (make sure to put the right Sort Order for each column) and in the Included columns, add the columns that are frequently used.
Alternative Method :
This is another approach that got in mind, you need to check and give it a try
SELECT
MAX(VM.VisitMovementID) as VisitMovementID
FROM
VisitMovement VM
WHERE
VisitID IN (SELECT VisitID FROM Visit WHERE EndDate > @RecentlyLeftDate)
GROUP BY
V.VisitID
Related Topics
How to Tell What Edition of SQL Server Runs on the MAChine
How to Insert Null Values into SQL Server
Retrieve Id of Record Just Inserted into a Java Db (Derby) Database
Postgresql Window Function: Partition by Comparison
Flattening Intersecting Timespans
Display Parent-Child Relationship When Parent and Child Are Stored in Same Table
Creating New User/Login in SQL Azure
How to Identify Invalid (Corrupted) Values Stored in Oracle Date Columns
How to Change the Collation of SQLite3 Database to Sort Case Insensitively
Postgresql: Top N Entries Per Item in Same Table
Rails Includes with Conditions
How to Select the Most Frequently Appearing Values
How to Check If Identity_Insert Is Set to on or Off in SQL Server
SQL - Columns for Different Categories
Recursive Query Used for Transitive Closure
Why Is Selecting Specified Columns, and All, Wrong in Oracle SQL
SQL How to Convert Row with Date Range to Many Rows with Each Date