Why No "Select Foo.* ... Group by Foo.Id" in Postgres

Why no SELECT foo.* ... GROUP BY foo.id in Postgres?

Just in case other people stumble over this question:

Starting with PostgreSQL 9.1 it's sufficient to list the columns of the primary key in the group by clause (so the example from the question would work now).

Trouble performing Postgres group by non-ID column to get ID containing max value

Try:

SELECT *
FROM (
SELECT t.*,
row_number() OVER( partition by user_id, foo_id ORDER BY effective_at DESC ) x
FROM user_foos t
)
WHERE x = 1

Error: Id must appear in group by clause, Postgresql?

SELECT * means "give me all columns" from table(s) involved in the FROM clause. I presume that it is not just UserId, but bunch of other columns as well. Which ones? Can't tell, as you SELECT * FROM "Deals". Consider avoiding SELECT * anywhere but for quick & dirty testing purposes.

Therefore, either enumerate all of them in the GROUP BY clause (which you probably don't want), or SELECT only UserId along with aggregated column; for example,

select okd."UserId", min(okd.position) 
FROM (your current FROM clause)
group by okd."UserId"

[EDIT, based on Oracle (as I have it), but applies to your database too]

Have a look at the following examples:

This works OK - I'm selecting department number and sum salaries of all employees who work in those departments:

SQL> select deptno, sum(sal)
2 from emp
3 group by deptno
4 order by deptno;

DEPTNO SUM(SAL)
---------- ----------
10 8750
20 6775
30 9400

I'd like to include job as well, i.e. sum salaries per department and job. If I include a new column into the SELECT but don't have it in GROUP BY, it'll fail:

SQL> select deptno, job, sum(sal)
2 from emp
3 group by deptno
4 order by deptno;
select deptno, job, sum(sal)
*
ERROR at line 1:
ORA-00979: not a GROUP BY expression

Therefore, you have two options:

  • one is to revert back to the first query (i.e. remove JOB and have DEPTNO only), or
  • include additional column into the GROUP BY clause

SQL> select deptno, job, sum(sal)
2 from emp
3 group by deptno, job
4 order by deptno, job;

DEPTNO JOB SUM(SAL)
---------- --------- ----------
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
20 ANALYST 3000
20 CLERK 800
20 MANAGER 2975
30 CLERK 950
30 MANAGER 2850
30 SALESMAN 5600

[A LITTLE BIT MORE]

Yet another thing: there's a way to aggregate values without using the GROUP BY clause; in Oracle, that's what analytic functions do. I don't know whether there's something like that in the database system you use, but you might check it. Here's an example:

SQL> select deptno, ename, job, sum(sal) over (partition by deptno) sum_sal_dept
2 from emp
3 order by deptno, job;

DEPTNO ENAME JOB SUM_SAL_DEPT
---------- ---------- --------- ------------
10 MILLER CLERK 8750
10 CLARK MANAGER 8750
10 KING PRESIDENT 8750
20 FORD ANALYST 6775
20 SMITH CLERK 6775
20 JONES MANAGER 6775
30 JAMES CLERK 9400
30 BLAKE MANAGER 9400
30 TURNER SALESMAN 9400
30 WARD SALESMAN 9400
30 ALLEN SALESMAN 9400
30 MARTIN SALESMAN 9400

See? Without the GROUP BY clause, I've calculated sum of salaries per departments.

must appear in the GROUP BY clause or be used in an aggregate function

Yes, this is a common aggregation problem. Before SQL3 (1999), the selected fields must appear in the GROUP BY clause[*].

To workaround this issue, you must calculate the aggregate in a sub-query and then join it with itself to get the additional columns you'd need to show:

SELECT m.cname, m.wmname, t.mx
FROM (
SELECT cname, MAX(avg) AS mx
FROM makerar
GROUP BY cname
) t JOIN makerar m ON m.cname = t.cname AND t.mx = m.avg
;

cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000

But you may also use window functions, which looks simpler:

SELECT cname, wmname, MAX(avg) OVER (PARTITION BY cname) AS mx
FROM makerar
;

The only thing with this method is that it will show all records (window functions do not group). But it will show the correct (i.e. maxed at cname level) MAX for the country in each row, so it's up to you:

 cname  | wmname |          mx           
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | luffy | 5.0000000000000000
spain | usopp | 5.0000000000000000

The solution, arguably less elegant, to show the only (cname, wmname) tuples matching the max value, is:

SELECT DISTINCT /* distinct here matters, because maybe there are various tuples for the same max value */
m.cname, m.wmname, t.avg AS mx
FROM (
SELECT cname, wmname, avg, ROW_NUMBER() OVER (PARTITION BY avg DESC) AS rn
FROM makerar
) t JOIN makerar m ON m.cname = t.cname AND m.wmname = t.wmname AND t.rn = 1
;

cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000

[*]: Interestingly enough, even though the spec sort of allows to select non-grouped fields, major engines seem to not really like it. Oracle and SQLServer just don't allow this at all. Mysql used to allow it by default, but now since 5.7 the administrator needs to enable this option (ONLY_FULL_GROUP_BY) manually in the server configuration for this feature to be supported...

PostgreSQL column 'foo' does not exist

You accidentally created the column name with a trailing space and presumably phpPGadmin created the column name with double quotes around it:

create table your_table (
"foo " -- ...
)

That would give you a column that looked like it was called foo everywhere but you'd have to double quote it and include the space whenever you use it:

select ... from your_table where "foo " is not null

The best practice is to use lower case unquoted column names with PostgreSQL. There should be a setting in phpPGadmin somewhere that will tell it to not quote identifiers (such as table and column names) but alas, I don't use phpPGadmin so I don't where that setting is (or even if it exists).

PostgreSQL select max with group by and additional value

SELECT DISTINCT ON (ID)
ID, Country, Area
FROM foo
ORDER BY ID, Area DESC NULLS LAST;

Detailed explanation and links to faster alternatives for special cases:

  • Select first row in each GROUP BY group?

How to replace a PostgreSQL pivot table field (foo_id) in the SELECT of a request by its value taken from the foo table?

You need to join the table foo, assuming both have a foo_id to join the tables:

SELECT
f.name,
COUNT(fb.foo_id) AS total
FROM foo_bar fb
JOIN foo f
ON f.foo_id = fb.foo_id
GROUP BY f.name


Related Topics



Leave a reply



Submit