MySQL Group by Behavior

MySQL GROUP BY behavior (when using a derived table with order by)

This query:

SELECT *
FROM (SELECT * FROM tbl order by timestamp) as tb2
GROUP BY userID;

Relies on a MySQL group by extension, which is documented here. You are specifically relying on the fact that all the columns come from the same row and the first one encountered. MySQL specifically warns against making this assumption:

MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.

So, you cannot depend on this behavior. It is easy enough to work around. Here is an example query:

select t.*
from tbl t
where not exists (select 1 from tbl t2 where t2.userid = t.userid and t2.timestamp > t.timestamp)

With an index on tbl(userid, timestamp) this may even work faster. MySQL does a notoriously poor job of optimizing aggregations.

Why Mysql's Group By and Oracle's Group by behaviours are different

The MySQL designers put in their nonstandard extension to GROUP BY in an attempt to make development easier and certain queries more efficient.

Here's their rationale.

https://dev.mysql.com/doc/refman/8.0/en/group-by-handling.html

There is a server mode called ONLY_FULL_GROUP_BY which disables the nonstandard extensions. You can set this mode using this statement.

 SET SESSION SQL_MODE='ONLY_FULL_GROUP_BY'  

Here's a quote from that page, with emphasis added.

If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns... In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want.

The important word here is nondeterministic. What does that mean? It means random, but worse. If the server chose random values, that implies it would return different values in different queries, so you have a chance of catching the problem when you test your software. But nondeterministic in this context means the server chooses the same value every time, until it doesn't.

Why might it change the value it chooses? A server upgrade is one reason. A change to table size might be another. The point is, the server is free to return whatever value it wants.

I wish people newly learning SQL would set this ONLY_FULL_GROUP_BY mode; they'd get much more predictable results from their queries, and the server would reject nondeterministic queries.

Unexpected GROUP BY behavior MySQL?

You can use the following, using a sub-query to get the last revisions of the quotes. You can INNER JOIN this sub-query with your table to get the full row of the latest revision per group (number):

SELECT quotes.* 
FROM quotes INNER JOIN (
SELECT number, MAX(revision) AS revision
FROM quotes
GROUP BY number
) max_quotes ON quotes.number = max_quotes.number AND quotes.revision = max_quotes.revision
ORDER BY id DESC

demo on dbfiddle.uk

MySQL group/order behaves differently in 5.7

You should go with the query below:

SELECT 
*
FROM tbl
INNER JOIN
(

SELECT
other_id,
language_id,
MAX(dateCreated) max_date_created
FROM tbl
WHERE other_id = 5
GROUP BY language_id
) AS t
ON tbl.language_id = t.language_id AND tbl.other_id = t.other_id AND
tbl.dateCreated = t.max_date_created

Using GROUP BY without aggregate function will pick row in arbitrary order. You should not rely on what's row is returned by the GROUP BY. MySQL doesn't ensure this.

Quoting from this post

In a nutshell, MySQL allows omitting some columns from the GROUP BY,
for performance purposes, however this works only if the omitted
columns all have the same value (within a grouping), otherwise, the
value returned by the query are indeed indeterminate, as properly
guessed by others in this post. To be sure adding an ORDER BY clause
would not re-introduce any form of deterministic behavior.

Although not at the core of the issue, this example shows how using *
rather than an explicit enumeration of desired columns is often a bad
idea.

Excerpt from MySQL 5.0 documentation:

When using this feature, all rows in each group should have the same
values for the columns that are omitted from the GROUP BY part. The
server is free to return any value from the group, so the results are
indeterminate unless all values are the same.

GROUP BY behavior when no aggregate functions are present in the SELECT clause

Read MySQL documentation on this particular point.

In a nutshell, MySQL allows omitting some columns from the GROUP BY, for performance purposes, however this works only if the omitted columns all have the same value (within a grouping), otherwise, the value returned by the query are indeed indeterminate, as properly guessed by others in this post. To be sure adding an ORDER BY clause would not re-introduce any form of deterministic behavior.

Although not at the core of the issue, this example shows how using * rather than an explicit enumeration of desired columns is often a bad idea.

Excerpt from MySQL 5.0 documentation:


When using this feature, all rows in each group should have the same values
for the columns that are omitted from the GROUP BY part. The server is free
to return any value from the group, so the results are indeterminate unless
all values are the same.

Weird behavior with MySQL GroupBy on multiple columns

I would use a correlated subquery, but not with fancy logic:

select u.*
from users u
where u.weight = (select min(u2.weight)
from users u2
where u2.height = u.height and u2.sex = u.sex
);

That is, return the users whose weight is the minimum for the height/sex combination.

Weird behavior of grouping in MySQL

Does this work for you?

SELECT
DATE_SUB(
date, INTERVAL WEEKDAY(DATE_SUB(date, INTERVAL -1 DAY)) DAY
) AS week_start,
SUM(value) as value__sum
FROM
metrics
GROUP BY
week_start
ORDER BY
week_start ASC;

I subtracted the day inside of WEEKDAY.

This results in:



Leave a reply



Submit