Aggregate a Single Column in Query with Many Columns

Aggregate a single column in query with many columns


Simple query

This can be much simpler with PostgreSQL 9.1 or later. As explained in this closely related answer:

  • PGError: ERROR: aggregates not allowed in WHERE clause on a AR query of an object and its has_many objects

It is enough to GROUP BY the primary key of a table. Since:

foo1 is a primary key

.. you can simplify your example to:

SELECT foo1, foo2, foo3, foo4, foo5, foo6, string_agg(aggregated_field, ', ')
FROM tbl1
GROUP BY 1
ORDER BY foo7, foo8; -- have to be spelled out, since not in select list!

Query with multiple tables

However, since you have:

many more fields and LEFT JOINs, the important part is that all these fields have 1 to 1 or 1 to 0 relationship except one field that is 1 to n which I want to aggregate

.. it should be faster and simpler to aggregate first, join later:

SELECT t1.foo1, t1.foo2, ...
, t2.bar1, t2.bar2, ...
, a.aggregated_col
FROM tbl1 t1
LEFT JOIN tbl2 t2 ON ...
...
LEFT JOIN (
SELECT some_id, string_agg(agg_col, ', ') AS aggregated_col
FROM agg_tbl a ON ...
GROUP BY some_id
) a ON a.some_id = ?.some_id
ORDER BY ...

This way the big portion of your query does not need aggregation at all.

I recently provided a test case in an SQL Fiddle to prove the point in this related answer:

  • PostgreSQL - order by an array

Since you are referring to this related answer: No, DISTINCT is not going to help at all in this case.

SQL Group By - Generate multiple aggregate columns from single column

You can do it using correlated subqueries like this:

SELECT 
Company,
Date,
(SELECT COUNT(*) FROM MyTable AS T1
WHERE T1.Flag='Y' AND T1.Company=T2.Company AND T1.Date=T2.Date) AS Count_Y,
(SELECT COUNT(*) FROM MyTable AS T1
WHERE T1.Flag='N' AND T1.Company=T2.Company AND T1.Date=T2.Date) AS Count_N
FROM MyTable AS T2
GROUP BY Company, Date

You can also do it more concisely, but perhaps with (arguably) slighly less readability using the SUM trick:

SELECT 
Company,
Date,
SUM(CASE WHEN Flag='Y' THEN 1 ELSE 0 END) AS Count_Y,
SUM(CASE WHEN Flag='N' THEN 1 ELSE 0 END) AS Count_N,
FROM MyTable
GROUP BY Company, Date

In Oracle/PLSQL, the DECODE function can be used to replace the CASE for the even more concise:

SELECT 
Company,
Date,
SUM(DECODE(Flag,'Y',1,0)) AS Count_Y,
SUM(DECODE(Flag,'N',1,0)) AS Count_N,
FROM MyTable
GROUP BY Company, Date

mysql group by single column but aggregate on multiple columns?

just move condition from every WHERE to SUM

SELECT 
user_id,
SUM(rank = 1) AS rank1,
SUM(rank = 2) AS rank2,
SUM(rank = 2 and is_fresh = 1 ) AS rank2_fresh,
SUM(IF(rank = 1, age, 0))/SUM(rank = 1) AS rank_1_avg_age
...
FROM users
GROUP BY user_id

Select multiple columns from a table, but group by one

I use this trick to group by one column when I have a multiple columns selection:

SELECT MAX(id) AS id,
Nume,
MAX(intrare) AS intrare,
MAX(iesire) AS iesire,
MAX(intrare-iesire) AS stoc,
MAX(data) AS data
FROM Produse
GROUP BY Nume
ORDER BY Nume

This works.

How to aggregate values in two columns in multiple records into one

You can unpivot and aggregate:

select firstname, lastname, string_agg(pt, ', ') as points
from (select t.*, v.pt,
row_number() over (partition by firstname, lastname, pt order by pt) as seqnum
from t cross apply
(values (t.startpoint), (t.endpoint)) as v(pt)
) t
where seqnum = 1
group by firstname, lastname;

Unfortunately, string_agg() doesn't support distinct. However, this is easily remedied by using row_number().

Edit:

If you wanted to identify each separate connected component, then you can use a recursive CTE:

with cte as (
select id, firstname, lastname,
convert(varchar(max), concat(startpoint, ', ', endpoint)) as points,
endpoint
from t
where not exists (select 1 from t t2 where t2.endpoint = t.startpoint)
union all
select cte.id, cte.firstname, cte.lastname,
concat(cte.point, ', ', cte.endpoint), t.endpoint
from cte join
t
on t.startpoint = cte.endpoint and t.id = cte.id
)
select *
from cte;

Here is a db<>fiddle.

Aggregate Function on multiple columns in SQL Server

I see at least 2 methods to get those results. A group by or a pivot

In the example below the 2 methods are shown.

CREATE TABLE #Temp (Id INT, code VARCHAR(5), Fname VARCHAR(20), CompanyId INT, FieldName VARCHAR(20), Value INT);

insert into #Temp (Id, code, Fname, CompanyId, FieldName, Value)
values
(465,00133,'JENN WILSON',1,'ERA',1573),
(465,00133,'JENN WILSON',1,'ESHIFTALLOW',3658),
(465,00133,'JENN WILSON',1,'NETPAY',51560);

with Q AS (
SELECT Id, code, Fname, CompanyId,
sum(case when FieldName = 'ERA' then Value end) as ERA,
sum(case when FieldName = 'ESHIFTALLOW' then Value end) as ESHIFTALLOW,
sum(case when FieldName = 'NETPAY' then Value end) as NETPAY
from #Temp
group by Id, code, Fname, CompanyId
)
select Id, code, Fname, CompanyId, 'Field1' as FieldName, (ERA + ESHIFTALLOW) as Value from Q
union all
select Id, code, Fname, CompanyId, 'Field2', (NETPAY - ERA + ESHIFTALLOW) from Q
;

with Q AS (
SELECT Id, code, Fname, CompanyId,
(ERA + ESHIFTALLOW) as Field1,
(NETPAY - ERA + ESHIFTALLOW) as Field2
FROM (SELECT * FROM #Temp) s
PIVOT ( SUM(VALUE) FOR FieldName IN (ERA, ESHIFTALLOW, NETPAY)) p
)
select Id, code, Fname, CompanyId, 'Field1' as FieldName, Field1 as Value from Q
union all
select Id, code, Fname, CompanyId, 'Field2', Field2 from Q
;

Note that SUM(VALUE) was used instead of MAX(VALUE). In this case it will yield the same results. It's just a choice really.

Aggregate functions on a single column only?

PostgreSql supports window functions, so the easy way is this:

SELECT  Pizza, 
Ingredient,
MAX(Amount) OVER(PARTITION BY Ingredient) As MaxAmount
FROM Recipe

Reading the question again, following Damien's comment, I think that what you are asking will not get you the results you want.

In the beginning of the question, you wrote:

My learning goal: is to find how to find an ingredient and see which recipe uses any given ingredient the most. see which recipe uses any given ingredient the most.

Later you wrote:

I want the name to be appended to result set of (a)

These statements conflict.

To know which pizza is using the most of a specific ingredient, as you stated in your first statement, use the (b) query from your question. You can order the results of it by ingredient, following the MAX(AMOUNT) column in a descending order - this will enable you to see what pizza is using the most of each ingredient easily.

SELECT Name, Ingredient, MAX(Amount) AS MaxAmount
FROM Recipe
GROUP BY Ingredient,Name
ORDER BY Ingredient, MaxAmount DESC;

The query in my answer, however, will get you what you what you are asking in your second statement - get the maximum value for each ingredient, grouped only by ingredient, but adding the pizza name to the result set. (In other words - append the pizza name to the result set of (a))

How to group by one column with single aggregate function but select multiple columns on oracle?

You can use Analytic Functions:

SELECT sum(sales_count) OVER (PARTITION BY sales_date), 
store, sales_date, product
FROM sales_log_bak
where sales_date > DATE '2017-03-01' and sales_date < DATE '2017-04-05';

Note, date and count are reserved words in Oracle, you should not use them for column names.

Using group by on multiple columns

Group By X means put all those with the same value for X in the one group.

Group By X, Y means put all those with the same values for both X and Y in the one group.

To illustrate using an example, let's say we have the following table, to do with who is attending what subject at a university:

Table: Subject_Selection

+---------+----------+----------+
| Subject | Semester | Attendee |
+---------+----------+----------+
| ITB001 | 1 | John |
| ITB001 | 1 | Bob |
| ITB001 | 1 | Mickey |
| ITB001 | 2 | Jenny |
| ITB001 | 2 | James |
| MKB114 | 1 | John |
| MKB114 | 1 | Erica |
+---------+----------+----------+

When you use a group by on the subject column only; say:

select Subject, Count(*)
from Subject_Selection
group by Subject

You will get something like:

+---------+-------+
| Subject | Count |
+---------+-------+
| ITB001 | 5 |
| MKB114 | 2 |
+---------+-------+

...because there are 5 entries for ITB001, and 2 for MKB114

If we were to group by two columns:

select Subject, Semester, Count(*)
from Subject_Selection
group by Subject, Semester

we would get this:

+---------+----------+-------+
| Subject | Semester | Count |
+---------+----------+-------+
| ITB001 | 1 | 3 |
| ITB001 | 2 | 2 |
| MKB114 | 1 | 2 |
+---------+----------+-------+

This is because, when we group by two columns, it is saying "Group them so that all of those with the same Subject and Semester are in the same group, and then calculate all the aggregate functions (Count, Sum, Average, etc.) for each of those groups". In this example, this is demonstrated by the fact that, when we count them, there are three people doing ITB001 in semester 1, and two doing it in semester 2. Both of the people doing MKB114 are in semester 1, so there is no row for semester 2 (no data fits into the group "MKB114, Semester 2")

Hopefully that makes sense.



Related Topics



Leave a reply



Submit