Optimize groupwise maximum query
Assuming relatively few rows in options
for many rows in records
.
Typically, you would have a look-up table options
that is referenced from records.option_id
, ideally with a foreign key constraint. If you don't, I suggest to create one to enforce referential integrity:
CREATE TABLE options (
option_id int PRIMARY KEY
, option text UNIQUE NOT NULL
);
INSERT INTO options
SELECT DISTINCT option_id, 'option' || option_id -- dummy option names
FROM records;
Then there is no need to emulate a loose index scan any more and this becomes very simple and fast. Correlated subqueries can use a plain index on (option_id, id)
.
SELECT option_id, (SELECT max(id)
FROM records
WHERE option_id = o.option_id) AS max_id
FROM options o
ORDER BY 1;
This includes options with no match in table records
. You get NULL for max_id
and you can easily remove such rows in an outer SELECT
if needed.
Or (same result):
SELECT option_id, (SELECT id
FROM records
WHERE option_id = o.option_id
ORDER BY id DESC NULLS LAST
LIMIT 1) AS max_id
FROM options o
ORDER BY 1;
May be slightly faster. The subquery uses the sort order DESC NULLS LAST
- same as the aggregate function max()
which ignores NULL values. Sorting just DESC
would have NULL first:
- Why do NULL values come first when ordering DESC in a PostgreSQL query?
The perfect index for this:
CREATE INDEX on records (option_id, id DESC NULLS LAST);
Index sort order doesn't matter much while columns are defined NOT NULL
.
There can still be a sequential scan on the small table options
, that's just the fastest way to fetch all rows. The ORDER BY
may bring in an index (only) scan to fetch pre-sorted rows.
The big table records
is only accessed via (bitmap) index scan or, if possible, index-only scan.
db<>fiddle here - showing two index-only scans for the simple case
Old sqlfiddle
Or use LATERAL
joins for a similar effect in Postgres 9.3+:
- Optimize GROUP BY query to retrieve latest row per user
mysql groupwise max as second where condition
Avoiding the inner join can improve the query:
SELECT *
FROM `test`
WHERE `master_id` =0
OR `id` IN (
SELECT t1.id
FROM (SELECT *
FROM test t2
WHERE t2.master_id!=0
ORDER BY t2.date ASC) t1
GROUP BY t1.master_id
)
ORDER BY `date`;
MySQL Query Optimization Group By with Max
The query looks fine. All you can do is provide an appropriate index. That would be an index on the columns in the WHERE
clause at least. Start with the most restrictive column. So,
- how many rows match
active = 1
? - how many rows match
deletedOn IS NULL
? - how many rows match
updatedOn <= timestamp '2019-03-25 21:00:00'
?
Pick the one that gets the least number of rows. Say it's active
, then updatedOn
, then deletedOn
. This gives you:
create index idx on audit_frame_master(active, updatedOn, deletedOn);
As you want to group by frame_id
then and then find the maximum id
, you can add those in this order:
create index idx on audit_frame_master(active, updatedOn, deletedOn, frame_id, id);
This is a covering index. If the DBMS uses it, it doesn't even have to access the table.
The DBMS may or may not use this index. It's just an offer. If the DBMS thinks it will be too much work to go through an index rather then simply read the table sequentially, then it won't use it. Just try.
MYSQL query slow SELECT DISTINCT
SOLUTION
By adding an index for count
I reduced the time by more than 95% !!!
Now the whole operation takes abou 1-1.5 ms.
The copy_to_temp table
is down from 10 ms to 0.5 ms - still using 65% of all time but sufficently fast for my needs.
Groupwise maximum record lookup for contracts and latest status
This is called a groupwise-maximum problem.
It looks like your locks
table gets updated sometimes, and those updates change the stamp
timestamp column. So your problem is to report out the latest -- most recent in time -- locks
record for each contractID
. Start with a subquery to determine the latest stamp
for each contract.
SELECT MAX(stamp) stamp, contractID
FROM locks
GROUP BY contractID
Then use that subquery in your main query to choose the appropriate row of locks
.
SELECT c.id ,c.partner ,l.stamp ,l.`type`
FROM contracts c
LEFT JOIN (
SELECT MAX(stamp) stamp, contractID
FROM locks
GROUP BY contractID
) latest ON c.contractID=latest.contractID
LEFT JOIN locks l ON c.contractID = l.contractID
AND latest.stamp = l.stamp
WHERE c.partner="2000000301"
ORDER BY c.id ASC
Notice that the latest locks
record is not necessarily the one with the largest id
value.
This index will help the query's performance when your locks table is large, by enabling the subquery to do a loose index scan.
ALTER TABLE locks ADD INDEX contractid_stamp (contractID, stamp);
And, you don't need both a PRIMARY KEY and a UNIQUE KEY on the same column. The PRIMARY KEY serves the purpose of guaranteeing uniqueness. Putting both keys on the table slows down INSERTs for no good reason.
Awful MySQL LEFT JOIN Performance for groupwise maximum
This should give the same results and perform much better.
SELECT p1.*
FROM prod_prices p1
INNER JOIN
( SELECT ID, MAX(Date) AS Date
FROM prod_prices
GROUP BY ID
) AS p2
ON p1.ID = p2.ID
AND p1.Date = p2.Date
Related Topics
How to Find Rows in One Table That Have No Corresponding Row in Another Table
Extracting Hours from a Datetime (SQL Server 2005)
Oracle Update Query Using Join
Name Database Design Notation You Prefer and Why
Column Name or Number of Supplied Values Does Not Match Table Definition
Postgres Not Using Index When Index Scan Is Much Better Option
SQL - Does the Order of Where Conditions Matter
How to Get the Number of Days Between 2 Dates in Oracle 11G
How to Extract Week Number in SQL
Entity Framework - Attribute in Clause Usage
Why Can't You Mix Aggregate Values and Non-Aggregate Values in a Single Select
Parameterise Table Name in .Net/Sql
How to Convert from Varbinary to Char/Varchar in MySQL
SQL Comments on Create Table on SQL Server 2008
How to Use My SQL Knowledge with Cloudant/Couchdb