Postgresql Group by Different from MySQL

PostgreSQL GROUP BY different from MySQL?

MySQL's totally non standards compliant GROUP BY can be emulated by Postgres' DISTINCT ON. Consider this:

MySQL:

SELECT a,b,c,d,e FROM table GROUP BY a

This delivers 1 row per value of a (which one, you don't really know). Well actually you can guess, because MySQL doesn't know about hash aggregates, so it will probably use a sort... but it will only sort on a, so the order of the rows could be random. Unless it uses a multicolumn index instead of sorting. Well, anyway, it's not specified by the query.

Postgres:

SELECT DISTINCT ON (a) a,b,c,d,e FROM table ORDER BY a,b,c

This delivers 1 row per value of a, this row will be the first one in the sort according to the ORDER BY specified by the query. Simple.

Note that here, it's not an aggregate I'm computing. So GROUP BY actually makes no sense. DISTINCT ON makes a lot more sense.

Rails is married to MySQL, so I'm not surprised that it generates SQL that doesn't work in Postgres.

PostgreSQL equivalent for MySQL GROUP BY

Back-ticks are a non-standard MySQL thing. Use the canonical double quotes to quote identifiers (possible in MySQL, too). That is, if your table in fact is named "MY_TABLE" (all upper case). If you (more wisely) named it my_table (all lower case), then you can remove the double quotes or use lower case.

Also, I use ct instead of count as alias, because it is bad practice to use function names as identifiers.

Simple case

This would work with PostgreSQL 9.1:

SELECT *, count(id) ct
FROM my_table
GROUP BY primary_key_column(s)
ORDER BY ct DESC;

It requires primary key column(s) in the GROUP BY clause. The results are identical to a MySQL query, but ct would always be 1 (or 0 if id IS NULL) - useless to find duplicates.

Group by other than primary key columns

If you want to group by other column(s), things get more complicated. This query mimics the behavior of your MySQL query - and you can use *.

SELECT DISTINCT ON (1, some_column)
count(*) OVER (PARTITION BY some_column) AS ct
,*
FROM my_table
ORDER BY 1 DESC, some_column, id, col1;

This works because DISTINCT ON (PostgreSQL specific), like DISTINCT (SQL-Standard), are applied after the window function count(*) OVER (...). Window functions (with the OVER clause) require PostgreSQL 8.4 or later and are not available in MySQL.

Works with any table, regardless of primary or unique constraints.

The 1 in DISTINCT ON and ORDER BY is just shorthand to refer to the ordinal number of the item in the SELECT list.

SQL Fiddle to demonstrate both side by side.

More details in this closely related answer:

  • Select first row in each GROUP BY group?


count(*) vs. count(id)

If you are looking for duplicates, you are better off with count(*) than with count(id). There is a subtle difference if id can be NULL, because NULL values are not counted - while count(*) counts all rows. If id is defined NOT NULL, results are the same, but count(*) is generally more appropriate (and slightly faster, too).

Translate query with GROUP BY from MySQL to Postgres and SQLite

I suspect the purpose of your query is to use the MySQL quirk to pick an arbitrary row for un-aggregated columns that are also not listed in the existing GROUP BY clause (which is against the SQL standard and not supported by most other RDBMS). By ordering in a subquery you make MySQL pick the row with the smallest field_three per gropup, so I'll assume you want:

the smallest field_three for every field_one, and field_two from the same row to go with it.


Your original query does not work in Postgres which follows the SQL standard here. If a SELECT has a GROUP BY clause, all output columns have to be listed or be aggregated. Consider:

  • How do I port query with GROUP BY clause to PostgreSQL?

Standard SQL with window function

One possible standard-SQL solution would be with the window function row_number() in a subquery:

SELECT field_one, field_two, field_three
FROM (
SELECT field_one, field_two, field_three
, row_number() OVER(PARTITION BY field_one ORDER BY field_three) rn
FROM table_one
) sub
WHERE sub.rn = 1

Works in Postgres, but not in SQLite or MySQL which do not support window functions, yet.

Basic standard SQL

This query works in all three RDBMS (and almost anywhere else, too), but requires unique maxima in field_three (returns multiple rows if there are ties for a maximum field_three per field_one).

SELECT t1.*
FROM table_one t1
LEFT JOIN table_one t2 ON t1.field_one = t2.field_one
AND t1.field_three < t2.field_three
WHERE t2.field_one IS NULL

You can resolve ties if you have any unique (set of) column(s), but it's unwieldy. Related:

  • Fetch the row which has the Max value for a column

Postgres

(Besides supporting all standard SQL solutions) Postgres also has the powerful DISTINCT ON (an extension of the standard DISTINCT, but not against the standard like the MySQL and SQLite quirks):

SELECT DISTINCT ON (field_one)
field_one, field_two, field_three
FROM table_one
ORDER BY field_one, field_three

You can resolve ties by adding more columns to the ORDER BY. Details:

  • Select first row in each GROUP BY group?

SQLite

... has a similar quirk as MySQL (violating the SQL standard). From the release notes:

Queries of the form: "SELECT max(x), y FROM table" returns the value of y on the same row that contains the maximum x value.

So:

SELECT field_one, field_two, max(field_three) AS field_three
FROM table_one
GROUP BY field_one

field_two is taken from the row of max(field_three). Related:

  • How can I select the set of rows where each item has the greatest timestamp?

Join and join condition work the same everywhere:

LEFT JOIN (SELECT ...) abc ON cur_table.some_field = abc.field_one

Group by clause in mySQL and postgreSQL, why the error in postgreSQL?

You need to use AGGREGATE FUNCTION:

Aggregate functions compute a single result from a set of input
values.

SELECT col2, MIN(col3) AS col3, MIN(col1) AS col1
FROM the_table
GROUP BY col2;

db<>fiddle demo


MySQL Handling of GROUP BY:

In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the select list that are not named in the
GROUP BY clause

and:

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate

So with MySQL version without explicit aggregate function you may end up with undetermininistic values. I strongly suggest to use specific aggregate function.


EDIT:

From MySQL Handling of GROUP BY:

SQL92 and earlier does not permit queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are not named in the GROUP BY clause.

SQL99 and later permits such nonaggregates per optional feature T301 if they are functionally dependent on GROUP BY columns: If such a relationship exists between name and custid, the query is legal. This would be the case, for example, were custid a primary key of customers.

Example:

SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o
JOIN customers AS c
ON o.custid = c.custid
GROUP BY o.custid;

PostgreSQL - GROUP BY clause

Postgres 9.1 or later, quoting the release notes of 9.1 ...

Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)

The SQL standard allows this behavior, and because of the primary key,
the result is unambiguous.

Related:

  • Return a grouped list with occurrences using Rails and PostgreSQL

The queries in the question and in @Michael's answer have the logic backwards. We want to count how many tags match per article, not how many articles have a certain tag. So we need to GROUP BY w_article.id, not by a_tags.id.

list all articles with that tag, and also how many of given tags they match

To fix this:

SELECT count(t.tag) AS ct, a.*  -- any column from table a allowed ...
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
JOIN w_article a ON a.id = a2t.article
WHERE t.tag IN ('css', 'php')
GROUP BY a.id -- ... since PK is in GROUP BY
LIMIT 9;

Assuming id is the primary key of w_article.

However, this form will be faster while doing the same:

SELECT a.*, ct
FROM (
SELECT a2t.article AS id, count(*) AS ct
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
GROUP BY 1
LIMIT 9 -- LIMIT early - cheaper
) sub
JOIN w_article a USING (id); -- attached alias to article in the sub

Closely related answer from just yesterday:

  • Why does the following join increase the query time significantly?

GROUP BY query that works in MySQL is rejected by PostgreSQL

The error explains it: you need to use an aggregate function to make this deterministic, eg:

SELECT units.*, max(activity_logs.created_at)
FROM "units" left join activity_logs on units.id=activity_logs.unit_id
GROUP BY units.id
ORDER BY max(activity_logs.created_at) desc;

Group By 2 columns and Count then Sort the count for each key pair

I think you want two sort keys:

ORDER BY device, times DESC;

If you want to sort the devices as well, so the biggest device is first, you can use window functions:

ORDER BY sum(times) over (partition by device) DESC, device, times DESC

postgres group by error

As i dont see any need for group by clause in your query, try this query:

SELECT * 
FROM user_ips
WHERE IP = '%s'
ORDER BY datetime

GROUP BY should be used when you are trying to say grouping records by say ip addresses and you are trying to apply aggregate functions likes MIN/MAX/SUM/AVG etc.

PostgreSQL Group By not working as expected - wants too many inclusions

The GROUP BY statement aggregates rows with the same values in the columns included in the group by into a single row. Because this row isn't the same as the original row, you can't have a column that is not in the group by or in an aggregate function. To get what you want, you will probably have to select without the ID column, then join the result to the original table. I don't know PostgreSQL syntax, but I assume it would be something like the following.

SELECT Table_Name.ID, aggregate.Account_ID, aggregate.MIteration
(SELECT Account_ID, MAX(Iteration) AS MIteration
FROM Table_Name
GROUP BY Account_ID) aggregate
LEFT JOIN Table_Name ON aggregate.Account_ID = Table_Name.Account_ID AND
aggregate.MIteration = Tabel_Name.Iteration


Related Topics



Leave a reply



Submit