Groupwise Maximum

Groupwise maximum

You can use this query. You can achieve results in 75% less time. I checked with more data set. Sub-Queries takes more time.

SELECT p1.id, 
p1.security,
p1.buy_date
FROM positions p1
left join
positions p2
on p1.security = p2.security
and p1.buy_date < p2.buy_date
where
p2.id is null;

SQL-Fiddle link

mySQL groupwise minimum and maximum

subqueries are not required. This query will get you min and max by article and dealer

SELECT article, dealer, max(price) as max_price, min(price) as min_price
FROM shop s1
group by 1,2
order by 1,2

in case dealers are different for the same article you can use this

SELECT article, max(price) as max_price, min(price) as min_price
FROM shop s1
group by 1
order by 1

Group-wise Maximum of a Certain Column

Standard SQL would reject your query because you can not SELECT non-aggregate fields that are not part of the GROUP BY clause in an aggregate query.

You're using a MySQL extension of SQL described here:

MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate
.

Groupwise maximum in larger query

You're looking for the latest row in your Email table for each distinct application_id.

Your subquery to get that isn't quite right. Here's how you get that.

SELECT s.application_id, e.student_email_id
FROM email e
JOIN (
SELECT MAX(tstamp) tstamp, application_id
FROM email
GROUP BY application_id
) s ON e.application_id = s.application_id AND e.tstamp = s.tstamp

There's another way to do this, that might be more efficient. It will work if the id column is an autoincrement column.

SELECT s.application_id, e.student_email_id
FROM email e
JOIN (
SELECT MAX(id) id
FROM email
GROUP BY application_id
) s ON e.id = s.id

Either of these preceding subqueries gets the latest student_email_id for each application_id. The second one uses the JOIN to extract only the highest id number for each application_id, and uses that id to find the latest student_email_id.

Your subquery was this. It doesn't get what you hoped for.

 SELECT MAX( tstamp ) AS tstamp, id, student_email_id, application_id /*wrong*/
FROM email
GROUP BY id, student_email_id, application_id

You grouped this by id. That means you're going to get all the detail rows. That's not what you want. Even this

 SELECT MAX( tstamp ) AS tstamp, student_email_id, application_id  /*wrong*/
FROM email
GROUP BY student_email_id, application_id

will give you more than one record for each application_id value.

So the query you need is:

SELECT  application.* ,  email1.student_email_id AS  email_student_email_id 
FROM application
LEFT JOIN (
SELECT s.application_id, e.student_email_id
FROM email e
JOIN (
SELECT MAX(id) id
FROM email
GROUP BY application_id
) s ON e.id = s.id
) AS email1 ON email1.application_id = application.id
WHERE application.status = 'returned'

When you're designing queries like this, it's smart to test from the inside out, starting with the innermost subquery.

Optimize groupwise maximum query

Assuming relatively few rows in options for many rows in records.

Typically, you would have a look-up table options that is referenced from records.option_id, ideally with a foreign key constraint. If you don't, I suggest to create one to enforce referential integrity:

CREATE TABLE options (
option_id int PRIMARY KEY
, option text UNIQUE NOT NULL
);

INSERT INTO options
SELECT DISTINCT option_id, 'option' || option_id -- dummy option names
FROM records;

Then there is no need to emulate a loose index scan any more and this becomes very simple and fast. Correlated subqueries can use a plain index on (option_id, id).

SELECT option_id, (SELECT max(id)
FROM records
WHERE option_id = o.option_id) AS max_id
FROM options o
ORDER BY 1;

This includes options with no match in table records. You get NULL for max_id and you can easily remove such rows in an outer SELECT if needed.

Or (same result):

SELECT option_id, (SELECT id
FROM records
WHERE option_id = o.option_id
ORDER BY id DESC NULLS LAST
LIMIT 1) AS max_id
FROM options o
ORDER BY 1;

May be slightly faster. The subquery uses the sort order DESC NULLS LAST - same as the aggregate function max() which ignores NULL values. Sorting just DESC would have NULL first:

  • Why do NULL values come first when ordering DESC in a PostgreSQL query?

The perfect index for this:

CREATE INDEX on records (option_id, id DESC NULLS LAST);

Index sort order doesn't matter much while columns are defined NOT NULL.

There can still be a sequential scan on the small table options, that's just the fastest way to fetch all rows. The ORDER BY may bring in an index (only) scan to fetch pre-sorted rows.

The big table records is only accessed via (bitmap) index scan or, if possible, index-only scan.

db<>fiddle here - showing two index-only scans for the simple case

Old sqlfiddle

Or use LATERAL joins for a similar effect in Postgres 9.3+:

  • Optimize GROUP BY query to retrieve latest row per user

Groupwise maximum record lookup for contracts and latest status

This is called a groupwise-maximum problem.

It looks like your locks table gets updated sometimes, and those updates change the stamp timestamp column. So your problem is to report out the latest -- most recent in time -- locks record for each contractID. Start with a subquery to determine the latest stamp for each contract.

                 SELECT MAX(stamp) stamp, contractID
FROM locks
GROUP BY contractID

Then use that subquery in your main query to choose the appropriate row of locks.

SELECT c.id ,c.partner ,l.stamp ,l.`type`
FROM contracts c
LEFT JOIN (
SELECT MAX(stamp) stamp, contractID
FROM locks
GROUP BY contractID
) latest ON c.contractID=latest.contractID
LEFT JOIN locks l ON c.contractID = l.contractID
AND latest.stamp = l.stamp
WHERE c.partner="2000000301"
ORDER BY c.id ASC

Notice that the latest locks record is not necessarily the one with the largest id value.

This index will help the query's performance when your locks table is large, by enabling the subquery to do a loose index scan.

ALTER TABLE locks ADD INDEX contractid_stamp (contractID, stamp);

And, you don't need both a PRIMARY KEY and a UNIQUE KEY on the same column. The PRIMARY KEY serves the purpose of guaranteeing uniqueness. Putting both keys on the table slows down INSERTs for no good reason.

mysql groupwise max as second where condition

Avoiding the inner join can improve the query:

SELECT *
FROM `test`
WHERE `master_id` =0
OR `id` IN (
SELECT t1.id
FROM (SELECT *
FROM test t2
WHERE t2.master_id!=0
ORDER BY t2.date ASC) t1
GROUP BY t1.master_id
)
ORDER BY `date`;

How to select the max aliased value in MySQL

You are getting an error because you cannot use alias references in the where clause. In you query you reference recent_meeting_date in the were clause. To fix your query you need to use the HAVING clause and you will be able to solve your problem. To more information about WHERE vs HAVING take a look on this stackoverflow

Here is the full query with the having clause:

    SELECT * , MAX(meeting_date) AS recent_meeting_date
FROM driver
INNER JOIN meeting_attendee ON meeting_attendee.attendee_email = driver.driver_email
INNER JOIN meeting ON meeting.meeting_id = meeting_attendee.meeting_id
GROUP BY driver_id
HAVING recent_meeting_date < UTC_TIMESTAMP
ORDER BY driver_id;


Related Topics



Leave a reply



Submit