Postgresql - Group by Clause

PostgreSQL - GROUP BY clause

Postgres 9.1 or later, quoting the release notes of 9.1 ...

Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)

The SQL standard allows this behavior, and because of the primary key,
the result is unambiguous.

Related:

  • Return a grouped list with occurrences using Rails and PostgreSQL

The queries in the question and in @Michael's answer have the logic backwards. We want to count how many tags match per article, not how many articles have a certain tag. So we need to GROUP BY w_article.id, not by a_tags.id.

list all articles with that tag, and also how many of given tags they match

To fix this:

SELECT count(t.tag) AS ct, a.*  -- any column from table a allowed ...
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
JOIN w_article a ON a.id = a2t.article
WHERE t.tag IN ('css', 'php')
GROUP BY a.id -- ... since PK is in GROUP BY
LIMIT 9;

Assuming id is the primary key of w_article.

However, this form will be faster while doing the same:

SELECT a.*, ct
FROM (
SELECT a2t.article AS id, count(*) AS ct
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
GROUP BY 1
LIMIT 9 -- LIMIT early - cheaper
) sub
JOIN w_article a USING (id); -- attached alias to article in the sub

Closely related answer from just yesterday:

  • Why does the following join increase the query time significantly?

How do I port query with GROUP BY clause to PostgreSQL?

Like @Andomar already provided: Most RDBMS require to group by every column that appears unaggregated - anywhere else in the query (including the SELECT list, but also in the WHERE clause etc.)

  • PGError: ERROR: aggregates not allowed in WHERE clause on a AR query of an object and its has_many objects

The SQL standard also defines that expressions in the GROUP BY clause shall also cover functionally dependent expressions. Postgres implemented that the PK column covers all columns of the same table.

  • PostgreSQL - GROUP BY clause

So op.id covers the whole table and this should work for your current query:

GROUP BY op.id, c.name, 5, t.name, p.name

5 being a positional reference to the SELECT list, which is also allowed in Postgres. It's just notational shorthand for repeating the long expression:

CASE
WHEN op.receiving_account_id IS NOT NULL THEN
CASE
WHEN op.account_id IS NULL THEN ac2.name
ELSE ac.name || ' -> ' || ac2.name
END
ELSE ac.name
END
  • Concatenate multiple result rows of one column into one, group by another column
  • Select first row in each GROUP BY group?

I derive from your names that you have a n:m relationship between operation and tag, implemented with operation_tag. All other joins don't seem to multiply rows, so it would be more efficient to aggregate tags separately - like @Andomar hinted, just get the logic right.

This should work:

SELECT op.id
, op.name
, c.name
, CASE -- amountsign
WHEN op.receiving_account_id IS NOT NULL THEN
CASE WHEN op.account_id IS NULL THEN '+' ELSE '=' END
ELSE '-'
END || ' ' || op.amount || ' zł' AS amount
, CASE -- account
WHEN op.receiving_account_id IS NOT NULL THEN
CASE
WHEN op.account_id IS NULL THEN ac2.name
ELSE ac.name || ' -> ' || ac2.name
END
ELSE ac.name
END AS account
, t.name AS type
, to_char(op.date, 'DD.MM.YY') || ' ' || op.time AS date -- see below
, p.name AS place
, ot.tags
FROM operation op
LEFT JOIN category c ON op.category_id = c.id
LEFT JOIN type t ON op.type_id = t.id
LEFT JOIN account ac ON op.account_id = ac.id
LEFT JOIN account ac2 ON op.receiving_account_id = ac2.id
LEFT JOIN place p ON op.place_id = p.id
LEFT JOIN (
SELECT operation_id, string_agg(t.name, ', ') AS tags
FROM operation_tag ot
LEFT JOIN tag t ON t.id = ot.tag_id
GROUP BY 1
) ot ON op.id = ot.operation_id

ORDER BY op.date DESC, op.time DESC;

Asides

You can replace:

CASE --date
WHEN op.time IS NOT NULL THEN to_char(op.date, 'DD.MM.YY') || ' ' || op.time
ELSE to_char(op.date, 'DD.MM.YY')
END AS date

with this shorter equivalent:

concat_ws(' ', to_char(op.date, 'DD.MM.YY'), op.time) AS date

But since both columns are defined NOT NULL, you can furher simplify to:

to_char(op.date, 'DD.MM.YY') || ' ' || op.time AS date

Careful with your ORDER BY you have at least one input column also named date. If you use the unqualified name, it will refer to the output column - which is what you want (as clarified in the comment). Details:

  • PostgreSQL: How to return rows with respect to a found row (relative results)?

However, sorting by the text representation would not sort according to your timeline correctly. Sort by original values instead as suggested in my query above.

PostgreSQL - GROUP BY clause or be used in an aggregate function

I think you are trying to aggregate and group by on the same column. It depends on what data you want. Ether do this:

SELECT 
cars.name,
cars.created_at,
cars.updated_at,
COUNT(cars.id) AS counter
FROM cars
LEFT JOIN users
ON cars.id=users.car_id
GROUP BY cars.name, cars.created_at, cars.updated_at
ORDER BY counter DESC

Or you want to count all maybe? Then like this:

SELECT
cars.id,
cars.name,
cars.created_at,
cars.updated_at,
COUNT(*) AS counter
FROM cars
LEFT JOIN users
ON cars.id=users.car_id
GROUP BY cars.id, cars.name, cars.created_at, cars.updated_at
ORDER BY counter DESC

PostgreSQL problem with the Group By Clause

You can use LAG() window function to check the previous value of Letter for each row and SUM() window function to create the groups of rows of consecutive occurrences.

Then aggregate in each group:

SELECT Letter, MAX(R_Value) Max_RValue
FROM (
SELECT *, SUM(flag::int) OVER (ORDER BY R_Value DESC) grp
FROM (
SELECT *, Letter <> LAG(Letter, 1, '') OVER (ORDER BY R_Value DESC) flag
FROM tablename
) t
) t
GROUP BY grp, Letter;

Or, simpler just select the rows where the Letter changes:

SELECT Letter, R_Value
FROM (
SELECT *, LAG(Letter, 1, '') OVER (ORDER BY R_Value DESC) prev_Letter
FROM tablename
) t
WHERE Letter <> prev_Letter;

See the demo.

PostgreSQL GROUP BY clause

You could use the rank window function to rank the employees per city, and then wrap that query with another query to get only the highest one per city:

SELECT ename, city, salary
FROM (SELECT ename, city, salary,
RANK() OVER (PARTITION BY city ORDER BY salary DESC) AS rk
FROM employee) t
WHERE rk = 1

avoiding group by clause while using aggregate function in sql query

Your main problem is that th.created_date is a timestamp which you are using as as one of your GROUP BY columns. The time is different on each record so there is no grouping on that column.

I'm not certain why you need to cast dates as Character Varying, but in any case if you cast to date first (e.g. th.created_date::DATE) as both the column you select and in GROUP BY. Then it will give you the sum for each date (rather than for each microsecond).

Optimized Postgresql like and group by clause

It sounds like you need an index skip-scan. PostgreSQL currently doesn't implement those automatically but you can emulate it with a recursive CTE. People are working on adding this to the planner so it will be chosen automatically, but even if they succeed it would probably not work with your case-folding LIKE condition. I couldn't see how to integrate the case-folding LIKE condition into the recursive CTE, but if you return all distinct genders preserving case, you can then filter that small list quickly without needing to use an index.

WITH RECURSIVE t AS (
SELECT min(gender) AS gender FROM employees
UNION ALL
SELECT (SELECT min(gender) FROM employees WHERE gender > t.gender)
FROM t WHERE t.gender IS NOT NULL
)
SELECT gender FROM t WHERE gender IS NOT NULL and lower(gender) like 'f%';

This took less than 2 ms for me, but it does require you add a plain index on gender, which you don't seem to have already.

Error: Id must appear in group by clause, Postgresql?

SELECT * means "give me all columns" from table(s) involved in the FROM clause. I presume that it is not just UserId, but bunch of other columns as well. Which ones? Can't tell, as you SELECT * FROM "Deals". Consider avoiding SELECT * anywhere but for quick & dirty testing purposes.

Therefore, either enumerate all of them in the GROUP BY clause (which you probably don't want), or SELECT only UserId along with aggregated column; for example,

select okd."UserId", min(okd.position) 
FROM (your current FROM clause)
group by okd."UserId"

[EDIT, based on Oracle (as I have it), but applies to your database too]

Have a look at the following examples:

This works OK - I'm selecting department number and sum salaries of all employees who work in those departments:

SQL> select deptno, sum(sal)
2 from emp
3 group by deptno
4 order by deptno;

DEPTNO SUM(SAL)
---------- ----------
10 8750
20 6775
30 9400

I'd like to include job as well, i.e. sum salaries per department and job. If I include a new column into the SELECT but don't have it in GROUP BY, it'll fail:

SQL> select deptno, job, sum(sal)
2 from emp
3 group by deptno
4 order by deptno;
select deptno, job, sum(sal)
*
ERROR at line 1:
ORA-00979: not a GROUP BY expression

Therefore, you have two options:

  • one is to revert back to the first query (i.e. remove JOB and have DEPTNO only), or
  • include additional column into the GROUP BY clause

SQL> select deptno, job, sum(sal)
2 from emp
3 group by deptno, job
4 order by deptno, job;

DEPTNO JOB SUM(SAL)
---------- --------- ----------
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
20 ANALYST 3000
20 CLERK 800
20 MANAGER 2975
30 CLERK 950
30 MANAGER 2850
30 SALESMAN 5600

[A LITTLE BIT MORE]

Yet another thing: there's a way to aggregate values without using the GROUP BY clause; in Oracle, that's what analytic functions do. I don't know whether there's something like that in the database system you use, but you might check it. Here's an example:

SQL> select deptno, ename, job, sum(sal) over (partition by deptno) sum_sal_dept
2 from emp
3 order by deptno, job;

DEPTNO ENAME JOB SUM_SAL_DEPT
---------- ---------- --------- ------------
10 MILLER CLERK 8750
10 CLARK MANAGER 8750
10 KING PRESIDENT 8750
20 FORD ANALYST 6775
20 SMITH CLERK 6775
20 JONES MANAGER 6775
30 JAMES CLERK 9400
30 BLAKE MANAGER 9400
30 TURNER SALESMAN 9400
30 WARD SALESMAN 9400
30 ALLEN SALESMAN 9400
30 MARTIN SALESMAN 9400

See? Without the GROUP BY clause, I've calculated sum of salaries per departments.



Related Topics



Leave a reply



Submit