Why Can't I Exclude Dependent Columns from 'Group By' When I Aggregate by a Key

Why can't I exclude dependent columns from `GROUP BY` when I aggregate by a key?

Because only the PK covers all columns of an underlying table in the GROUP BY clause. Hence your first query works. A UNIQUE constraint does not.

The combination of a non-deferrable UNIQUE and a NOT NULL constraint would also qualify. But that's not implemented - as well as some other functional dependencies known to the SQL standard. Peter Eisentraut, the principal author of the feature, had more in mind, but it was determined at the time that the demand is low and associated costs might be high. See the discussion about the feature on pgsql-hackers.

The manual:

When GROUP BY is present, or any aggregate functions are present, it
is not valid for the SELECT list expressions to refer to ungrouped
columns except within aggregate functions or when the ungrouped column
is functionally dependent on the grouped columns, since there would
otherwise be more than one possible value to return for an ungrouped
column. A functional dependency exists if the grouped columns (or a
subset thereof) are the primary key of the table containing the
ungrouped column.

And more explicitly:

PostgreSQL recognizes functional dependency (allowing columns to be
omitted from GROUP BY) only when a table's primary key is included in
the GROUP BY list. The SQL standard specifies additional conditions
that should be recognized.

Since c.vin is UNIQUE NOT NULL, you can fix your second query by using the PK column instead:

...
group by c.id;

Aside, while referential integrity is enforced and the whole table is queried, both of the given queries can be substantially cheaper: aggregate rows in appraisal before the join. This removes the need to GROUP BY in the outer SELECT a priori. Like:

SELECT c.vin, c.color, c.brand
, a.min_appraisal
, a.max_appraisal
FROM car c
LEFT JOIN (
SELECT car_vin
, min(price) AS min_appraisal
, max(price) AS max_appraisal
FROM appraisal
GROUP BY car_vin
) a ON a.car_vin = c.vin;

See:

  • Multiple array_agg() calls in a single query

Related:

  • SQL statement working in MySQL not working in Postgresql - Sum & group_by rails 3
  • PostgreSQL - GROUP BY clause

SQL Server group by foreign key and select dependant columns

In the question I think you mean Foreign key rather than Primary key... the field is a Primary key in another table Academic_unit but is looking at, say, student_unit records which have an FK to Academic_unit.

So the question is for the field alumno.NOM_UNIDAD_ACADEM_SCD - do you GROUP BY it, MAX() it or JOIN it later?

Personally I suggest just

  • trying all three and see which ones run the fastest - which is best really depends on specific circumstances - and they often run very similarly
  • use the simplest version if they run at similar speeds - which is likely to be the GROUP BY version

In particular, the GROUP BY and MAX() should result in almost identical plans as they are sorted the same way.

The 'join it later' approach can have some speed advantages in certain circumstances (particularly when it's not just being joined to a reference table, but to a broader set of sub-queries), but I'm often wary about these. They have the disadvantage of making your code a bit more complex - which can have issues if you use the data for other things, or if SQL Server has bad estimates for the amount of data it expects. In this case, as this is just linking to the reference table alumno, it's unlikely to give any specific advantage.

In your code for option 3 above, you still have links to BANNER_ENCUESTA.ALUMNO_SCD AS alumno. The advantage of doing the join later would be to remove that from the initial grouping component, then link to it later to get the specific values e.g.,

  • In the GROUP BY within the CTE, also group by ria.COD_ALUMNO_SCD, but remove BANNER_ENCUESTA.ALUMNO_SCD AS alumno from the FROM clause
  • Put BANNER_ENCUESTA.ALUMNO_SCD AS alumno into the main SELECT part of the query, and join to banner_questions on that field

Note there is also a fourth option (temporary tables) which is used when

  • SQL Server gets estimates for how many rows it expects really wrong - and makes a really bad plan
  • You're joining not to reference tables, but to views (particularly if they have 'TOP' expressions or 'GROUP BY' in them) - in these cases, SQL Server may sometimes run the view completely once for every row in the join.

In these cases, it can be useful to split the query into two parts along the lines of #3, but instead of a CTE, you save it into a temporary table e.g., SELECT .... INTO #temp FROM ... GROUP BY.

You then use the temporary table, joined to the view that was problematic, and it will often run better.

GROUP BY not working when I run the following: SELECT jobtitle, salary, COUNT(*) FROM employees GROUP BY salary;

Historically, mysql, unlike most other sql servers, has allowed arbitrary fields to be selected when using group by. In your example, jobtitle would be returned from some arbitrary one of the employees rows for the returned salary. There's no guarantee that it would even be the same for a given salary between two runs of the query.

Since this is a common source of bugs, mysql has moved away from allowing this, first allowing opting in to stricter validation and then making it the default. This is controlled by the SQL_MODE system variable. You can revert to the old, bad way by doing set session sql_mode=replace(@@sql_mode,'ONLY_FULL_GROUP_BY',''); before your query.

But it is better to follow the stricter rules. You can tell your query explicitly how to find a non-arbitrary jobtitle for each salary, like:

SELECT MIN(jobtitle) AS jobtitle, salary, COUNT(*) FROM employees GROUP BY salary;

or

SELECT MAX(jobtitle) AS jobtitle, salary, COUNT(*) FROM employees GROUP BY salary;

or you can simply group by both, potentially returning multiple rows for a given salary when there is more than one jobtitle for that salary:

SELECT jobtitle, salary, COUNT(*) FROM employees GROUP BY salary, jobtitle;

I can't seem to translate from SQL to postgresql correctly when using SELECT, why is this wrong?

Every field that is not part of a domain (or aggregate) function in the select list needs to be in the group-by clause. In your case, that's symbol, name, price, and total.

Furthermore, name and total are terrible field names, as they are reserved words. In SQL Server, you would enclose those in square brackets. In PostgreSQL you would put them in double-quotes.

But really, you should not use those names.

SELECT list is not in GROUP BY clause and contains nonaggregated column .... incompatible with sql_mode=only_full_group_by

This

Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'returntr_prod.tbl_customer_pod_uploads.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

will be simply solved by changing the sql mode in MySQL by this command,

SET GLOBAL sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY',''));

This too works for me..
I used this, because in my project there are many Queries like this so I just changed this sql mode to only_full_group_by

OR simply include all columns in the GROUP BY clause that was specified by the SELECT statement. The sql_mode can be left enabled.

Thank You... :-)

PostgreSQL - GROUP BY clause

Postgres 9.1 or later, quoting the release notes of 9.1 ...

Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)

The SQL standard allows this behavior, and because of the primary key,
the result is unambiguous.

Related:

  • Return a grouped list with occurrences using Rails and PostgreSQL

The queries in the question and in @Michael's answer have the logic backwards. We want to count how many tags match per article, not how many articles have a certain tag. So we need to GROUP BY w_article.id, not by a_tags.id.

list all articles with that tag, and also how many of given tags they match

To fix this:

SELECT count(t.tag) AS ct, a.*  -- any column from table a allowed ...
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
JOIN w_article a ON a.id = a2t.article
WHERE t.tag IN ('css', 'php')
GROUP BY a.id -- ... since PK is in GROUP BY
LIMIT 9;

Assuming id is the primary key of w_article.

However, this form will be faster while doing the same:

SELECT a.*, ct
FROM (
SELECT a2t.article AS id, count(*) AS ct
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
GROUP BY 1
LIMIT 9 -- LIMIT early - cheaper
) sub
JOIN w_article a USING (id); -- attached alias to article in the sub

Closely related answer from just yesterday:

  • Why does the following join increase the query time significantly?

SQL statement working in MySQL not working in Postgresql - Sum & group_by rails 3

You just have to add the portfolio_values.id column to the GROUP BY statement - like so:

portfolio_values.select("portfolio_values.*, SUM(portfolio_values.value) as totals").group("portfolio_values.id, portfolio_values.day").map(&:totals)

EDIT

While this is valid SQL, it's not a useful query - see other post.



Related Topics



Leave a reply



Submit