Using Group by on Multiple Columns

Using group by on multiple columns

Group By X means put all those with the same value for X in the one group.

Group By X, Y means put all those with the same values for both X and Y in the one group.

To illustrate using an example, let's say we have the following table, to do with who is attending what subject at a university:

Table: Subject_Selection

+---------+----------+----------+
| Subject | Semester | Attendee |
+---------+----------+----------+
| ITB001 | 1 | John |
| ITB001 | 1 | Bob |
| ITB001 | 1 | Mickey |
| ITB001 | 2 | Jenny |
| ITB001 | 2 | James |
| MKB114 | 1 | John |
| MKB114 | 1 | Erica |
+---------+----------+----------+

When you use a group by on the subject column only; say:

select Subject, Count(*)
from Subject_Selection
group by Subject

You will get something like:

+---------+-------+
| Subject | Count |
+---------+-------+
| ITB001 | 5 |
| MKB114 | 2 |
+---------+-------+

...because there are 5 entries for ITB001, and 2 for MKB114

If we were to group by two columns:

select Subject, Semester, Count(*)
from Subject_Selection
group by Subject, Semester

we would get this:

+---------+----------+-------+
| Subject | Semester | Count |
+---------+----------+-------+
| ITB001 | 1 | 3 |
| ITB001 | 2 | 2 |
| MKB114 | 1 | 2 |
+---------+----------+-------+

This is because, when we group by two columns, it is saying "Group them so that all of those with the same Subject and Semester are in the same group, and then calculate all the aggregate functions (Count, Sum, Average, etc.) for each of those groups". In this example, this is demonstrated by the fact that, when we count them, there are three people doing ITB001 in semester 1, and two doing it in semester 2. Both of the people doing MKB114 are in semester 1, so there is no row for semester 2 (no data fits into the group "MKB114, Semester 2")

Hopefully that makes sense.

How to group by multiple columns and order by date in SQL?

To order groups by their minimumstart date, you can do a window min() in the order by clause:

SELECT Id, Day, Start_Date, End_Date
FROM Table
-- GROUP BY Id, Day, Start_Date, End_Date
ORDER BY
MIN(Start_Date) OVER(PARTITION BY Id),
Id,
Day,
Start_Date,
End_date

Note: the fact that you are not using any aggregate function in the SELECT clause drives me to suspect that you don't actually need a GROUP BY clause. I commented that part of the query, feel free to add it back if, for some reason that I cannot think of, you do need it (if you just need to remove some duplicates, use SELECT DISTINCT, which makes the intent clearer).

Demo on DB Fiddle:


Id | Day | Start_Date | End_Date
:----------------------------------- | --: | :------------------ | :------------------
13D377E8-7674-4BE8-ACDF-472B634342D3 | 1 | 26/11/2019 00:00:00 | 26/11/2019 00:00:00
13D377E8-7674-4BE8-ACDF-472B634342D3 | 2 | 27/11/2019 00:00:00 | 27/11/2019 00:00:00
13D377E8-7674-4BE8-ACDF-472B634342D3 | 3 | 28/11/2019 00:00:00 | 28/11/2019 00:00:00
78C8F3AD-DE5B-48BD-849A-6E39C7EC6200 | 1 | 27/11/2019 00:00:00 | 27/11/2019 00:00:00
78C8F3AD-DE5B-48BD-849A-6E39C7EC6200 | 2 | 28/11/2019 00:00:00 | 28/11/2019 00:00:00
78C8F3AD-DE5B-48BD-849A-6E39C7EC6200 | 3 | 29/11/2019 00:00:00 | 29/11/2019 00:00:00
B73ECD8B-5760-4F92-94E5-CF5270AEE36B | 1 | 28/11/2019 00:00:00 | 28/11/2019 00:00:00
B73ECD8B-5760-4F92-94E5-CF5270AEE36B | 2 | 29/11/2019 00:00:00 | 29/11/2019 00:00:00
B73ECD8B-5760-4F92-94E5-CF5270AEE36B | 3 | 30/11/2019 00:00:00 | 30/11/2019 00:00:00

How to GROUP BY multiple columns with multiple HAVING values in MySQL?

You can set both conditions in the same HAVING clause:

SELECT `query` 
FROM `analytics`
WHERE `date` >= '2021-01-01'
GROUP BY `query`
HAVING COUNT(*) >= 3 AND COUNT(DISTINCT `user`) >= 2;

Grouping by multiple columns in MYSQL across multiple tables (INNER JOIN)

I found the answer myself:

SELECT  o.orderid, s.suppliername, COUNT(p.productname) AS numberOfProducts
FROM Orders o
JOIN OrderDetails od
ON o.orderid = od.orderid
JOIN Products p
ON p.productid = od.productid
JOIN Suppliers s
ON s.supplierid = p.supplierid
GROUP BY o.orderid, s.suppliername
HAVING o.orderid = 10300;

The mainissue was that
ON o.shipperid = s.supplierid had to be
ON s.supplierid = p.supplierid

friendly users on SO helped me out on that :)

Group by multiple columns and limit per group - Postgres

You can use rank(). To limit the number of records per conversation (ie sender/receiver or receiver/sender tuple), you can use a partition like least(sender_id, receiver_id), greatest(sender_id, receiver_id):

select filter.id, filter.sender_id, filter.receiver_id, filter.message, filter.created_at
from (
select
t.*,
rank() over(
partition by least(sender_id, receiver_id), greatest(sender_id, receiver_id)
order by created_at desc
) rn
from mytable t
) t
where rn <= 50
order by least(sender_id, receiver_id), greatest(sender_id, receiver_id), rn

Delete using group by with multiple columns

Use an INNER JOIN with your table

DELETE t1 FROM myTable t1
INNER JOIN (
SELECT col1, col2, col3 FROM (
SELECT col1, col2, col3 FROM myTable
GROUP BY col1, col2, col3 HAVING count(*) > 1 )
t )
t2 ON t2.rcol1 = t1.rcol1 AND t2.col2 = t1.col AND t2.col3 = t1.col3;

But you you should test it on a test database, because i don't think that your select identifies the right rows, better would be to have a UNIQUE column, that would identify the correct rows, because this would delete all rows



Related Topics



Leave a reply



Submit