SQL Grouping by All the Columns

group by all columns from one table

No, you don't have to type them all, because you don't need to use group by. Instead, use a correlated subquery:

select c.* ,
(select max(TLO.BILL_DATE)
from TLORDER TLO
where TLO.CUSTOMER = c.CLIENT_ID or
TLO.ORIGIN = c.CLIENT_ID or
TLO.DESTINATION = c.CLIENT_ID
)
from client c;

If you used group by, then you would have to list all the columns. Do note that ANSI SQL has support for using only a primary or unique key in this case. So, this would be ANSI-compliant:

select c.*, max(TLO.BILL_DATE)
from client c left join
TLORDER TLO
on TLO.CUSTOMER = c.CLIENT_ID or
TLO.ORIGIN = c.CLIENT_ID or
TLO.DESTINATION = c.CLIENT_ID
group by c.c_id;

I don't believe that DB2 supports this construct, although a few other databases do.

Why do I need to explicitly specify all columns in a SQL GROUP BY clause - why not GROUP BY *?

It's hard to know exactly what the designers of the SQL language were thinking when they wrote the standard, but here's my opinion.

SQL, as a general rule, requires you to explicitly state your expectations and your intent. The language does not try to "guess what you meant", and automatically fill in the blanks. This is a good thing.

When you write a query the most important consideration is that it yields correct results. If you made a mistake, it's probably better that the SQL parser informs you, rather than making a guess about your intent and returning results that may not be correct. The declarative nature of SQL (where you state what you want to retrieve rather than the steps how to retrieve it) already makes it easy to inadvertently make mistakes. Introducing fuzziniess into the language syntax would not make this better.

In fact, every case I can think of where the language allows for shortcuts has caused problems. Take, for instance, natural joins - where you can omit the names of the columns you want to join on and allow the database to infer them based on column names. Once the column names change (as they naturally do over time) - the semantics of existing queries changes with them. This is bad ... very bad - you really don't want this kind of magic happening behind the scenes in your database code.

One consequence of this design choice, however, is that SQL is a verbose language in which you must explicitly express your intent. This can result in having to write more code than you may like, and gripe about why certain constructs are so verbose ... but at the end of the day - it is what it is.

Select all columns with GROUP BY one column

distinct on

select distinct on (key) *
from t
order by key, name

Notice that the order by clause determines which row will win the ties.

Apply aggregate function to all columns on table with group by

You can use DISTINCT ON to get one row per group and join that with total scores calculated by a GROUP BY query. With this approach there will be score column containing value from some row in a group and a separate column for total score.

WITH total_scores AS (
SELECT age, name, SUM(score) AS total_score
FROM test_table
GROUP BY age, name
)
SELECT DISTINCT ON (tt.age, tt.name)
tt.*, ts.total_score
FROM test_table tt
JOIN total_scores ts ON tt.age = ts.age AND tt.name = ts.name

That said, it seems that you could normalize your data into two tables, one containing rows that have duplicate values (i.e. everything else except score) and another table containing score and a foreign key pointing to the first table.



Related Topics



Leave a reply



Submit