Optimize Groupwise Maximum Query

Optimize groupwise maximum query

Assuming relatively few rows in options for many rows in records.

Typically, you would have a look-up table options that is referenced from records.option_id, ideally with a foreign key constraint. If you don't, I suggest to create one to enforce referential integrity:

CREATE TABLE options (
  option_id int  PRIMARY KEY
, option    text UNIQUE NOT NULL
);

INSERT INTO options
SELECT DISTINCT option_id, 'option' || option_id -- dummy option names
FROM   records;

Then there is no need to emulate a loose index scan any more and this becomes very simple and fast. Correlated subqueries can use a plain index on (option_id, id).

SELECT option_id, (SELECT max(id)
                   FROM   records
                   WHERE  option_id = o.option_id) AS max_id
FROM   options o
ORDER  BY 1;

This includes options with no match in table records. You get NULL for max_id and you can easily remove such rows in an outer SELECT if needed.

Or (same result):

SELECT option_id, (SELECT id
                   FROM   records
                   WHERE  option_id = o.option_id
                   ORDER  BY id DESC NULLS LAST
                   LIMIT  1) AS max_id
FROM   options o
ORDER  BY 1;

May be slightly faster. The subquery uses the sort order DESC NULLS LAST - same as the aggregate function max() which ignores NULL values. Sorting just DESC would have NULL first:

Why do NULL values come first when ordering DESC in a PostgreSQL query?

The perfect index for this:

CREATE INDEX on records (option_id, id DESC NULLS LAST);

Index sort order doesn't matter much while columns are defined NOT NULL.

There can still be a sequential scan on the small table options, that's just the fastest way to fetch all rows. The ORDER BY may bring in an index (only) scan to fetch pre-sorted rows.

The big table records is only accessed via (bitmap) index scan or, if possible, index-only scan.

db<>fiddle here - showing two index-only scans for the simple case

_{Old sqlfiddle}

Or use LATERAL joins for a similar effect in Postgres 9.3+:

Optimize GROUP BY query to retrieve latest row per user

mysql groupwise max as second where condition

Avoiding the inner join can improve the query:

SELECT *
FROM `test`
WHERE `master_id` =0
OR `id` IN (
    SELECT t1.id 
    FROM (SELECT * 
        FROM test t2 
        WHERE t2.master_id!=0   
        ORDER BY t2.date ASC) t1
    GROUP BY t1.master_id
)
ORDER BY `date`;

MySQL Query Optimization Group By with Max

The query looks fine. All you can do is provide an appropriate index. That would be an index on the columns in the WHERE clause at least. Start with the most restrictive column. So,

how many rows match active = 1?
how many rows match deletedOn IS NULL?
how many rows match updatedOn <= timestamp '2019-03-25 21:00:00'?

Pick the one that gets the least number of rows. Say it's active, then updatedOn, then deletedOn. This gives you:

create index idx on audit_frame_master(active, updatedOn, deletedOn);

As you want to group by frame_id then and then find the maximum id, you can add those in this order:

create index idx on audit_frame_master(active, updatedOn, deletedOn, frame_id, id);

This is a covering index. If the DBMS uses it, it doesn't even have to access the table.

The DBMS may or may not use this index. It's just an offer. If the DBMS thinks it will be too much work to go through an index rather then simply read the table sequentially, then it won't use it. Just try.

MYSQL query slow SELECT DISTINCT

SOLUTION

By adding an index for count I reduced the time by more than 95% !!!
Now the whole operation takes abou 1-1.5 ms.

The copy_to_temp table is down from 10 ms to 0.5 ms - still using 65% of all time but sufficently fast for my needs.

profile

Groupwise maximum record lookup for contracts and latest status

This is called a groupwise-maximum problem.

It looks like your locks table gets updated sometimes, and those updates change the stamp timestamp column. So your problem is to report out the latest -- most recent in time -- locks record for each contractID. Start with a subquery to determine the latest stamp for each contract.

                 SELECT MAX(stamp) stamp, contractID
                   FROM locks
                  GROUP BY contractID

Then use that subquery in your main query to choose the appropriate row of locks.

SELECT c.id ,c.partner ,l.stamp ,l.`type`
  FROM contracts c
  LEFT JOIN (
                 SELECT MAX(stamp) stamp, contractID
                   FROM locks
                  GROUP BY contractID
       ) latest ON c.contractID=latest.contractID  
  LEFT JOIN locks l   ON c.contractID = l.contractID
                     AND latest.stamp = l.stamp
 WHERE c.partner="2000000301"
 ORDER BY c.id ASC

Notice that the latest locks record is not necessarily the one with the largest id value.

This index will help the query's performance when your locks table is large, by enabling the subquery to do a loose index scan.

ALTER TABLE locks ADD INDEX contractid_stamp (contractID, stamp);

And, you don't need both a PRIMARY KEY and a UNIQUE KEY on the same column. The PRIMARY KEY serves the purpose of guaranteeing uniqueness. Putting both keys on the table slows down INSERTs for no good reason.

Awful MySQL LEFT JOIN Performance for groupwise maximum

This should give the same results and perform much better.

SELECT  p1.*
FROM    prod_prices p1
        INNER JOIN
        (   SELECT  ID, MAX(Date) AS Date
            FROM    prod_prices
            GROUP BY ID
        ) AS p2
            ON p1.ID = p2.ID 
            AND p1.Date = p2.Date

Optimize Groupwise Maximum Query