How does GROUP BY work?
GROUP BY
returns a single row for each unique combination of the GROUP BY
fields. So in your example, every distinct combination of (a1, a2)
occurring in rows of Tab1
results in a row in the query representing the group of rows with the given combination of group by field values . Aggregate functions like SUM()
are computed over the members of each group.
Using group by on multiple columns
Group By X
means put all those with the same value for X in the one group.
Group By X, Y
means put all those with the same values for both X and Y in the one group.
To illustrate using an example, let's say we have the following table, to do with who is attending what subject at a university:
Table: Subject_Selection
+---------+----------+----------+
| Subject | Semester | Attendee |
+---------+----------+----------+
| ITB001 | 1 | John |
| ITB001 | 1 | Bob |
| ITB001 | 1 | Mickey |
| ITB001 | 2 | Jenny |
| ITB001 | 2 | James |
| MKB114 | 1 | John |
| MKB114 | 1 | Erica |
+---------+----------+----------+
When you use a group by
on the subject column only; say:
select Subject, Count(*)
from Subject_Selection
group by Subject
You will get something like:
+---------+-------+
| Subject | Count |
+---------+-------+
| ITB001 | 5 |
| MKB114 | 2 |
+---------+-------+
...because there are 5 entries for ITB001, and 2 for MKB114
If we were to group by
two columns:
select Subject, Semester, Count(*)
from Subject_Selection
group by Subject, Semester
we would get this:
+---------+----------+-------+
| Subject | Semester | Count |
+---------+----------+-------+
| ITB001 | 1 | 3 |
| ITB001 | 2 | 2 |
| MKB114 | 1 | 2 |
+---------+----------+-------+
This is because, when we group by two columns, it is saying "Group them so that all of those with the same Subject and Semester are in the same group, and then calculate all the aggregate functions (Count, Sum, Average, etc.) for each of those groups". In this example, this is demonstrated by the fact that, when we count them, there are three people doing ITB001 in semester 1, and two doing it in semester 2. Both of the people doing MKB114 are in semester 1, so there is no row for semester 2 (no data fits into the group "MKB114, Semester 2")
Hopefully that makes sense.
How does GroupBy in LINQ work?
Group by works by taking whatever you are grouping and putting it into a collection of items that match the key you specify in your group by clause.
If you have the following data:
Member name Group code
Betty 123
Mildred 123
Charli 456
Mattilda 456
And the following query
var query = from m in members
group m by m.GroupCode into membersByGroupCode
select membersByGroupCode;
The group by will return the following results:
You wouldn’t typically want to just select the grouping directly. What if we just want the group code and the member names without all of the other superfluous data?
We just need to perform a select to get the data that we are after:
var query = from m in members
group m by m.GroupCode into membersByGroupCode
let memberNames = from m2 in membersByGroupCode
select m2.Name
select new
{
GroupCode = membersByGroupCode.Key,
MemberNames = memberNames
};
Which returns the following results:
How does group by work in sub queries?
Your derived table is missing an alias.
SELECT SUM(Mean) Total Mean, Number
FROM (SELECT Name, avg(Value) Mean, Number
FROM Table1
WHERE Category = 'Time'
GROUP BY Name, Number) t --alias for the derived table
GROUP BY Number;
Understanding how WHERE works with GROUP BY and Aggregation
You have the order wrong. The WHERE
clause goes before the GROUP BY
:
select cu.CustomerID,cu.FirstName,cu.LastName, COUNT(si.InvoiceID)as inv
from Customer as cu
inner join SalesInvoice as si
on cu.CustomerID = si.CustomerID
where cu.FirstName = 'mark'
group by cu.CustomerID,cu.FirstName,cu.LastName
If you want to perform a filter after the GROUP BY
, then you will use a HAVING
clause:
select cu.CustomerID,cu.FirstName,cu.LastName, COUNT(si.InvoiceID)as inv
from Customer as cu
inner join SalesInvoice as si
on cu.CustomerID = si.CustomerID
group by cu.CustomerID,cu.FirstName,cu.LastName
having cu.FirstName = 'mark'
A HAVING
clause is typically used for aggregate function filtering, so it makes sense that this would be applied after the GROUP BY
To learn about the order of operations here is article explaining the order. From the article the order of operation in SQL is:
To start out, I thought it would be good to look up the order in which SQL directives get executed as this will change the way I can optimize:
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
Using this order you will apply the filter in the WHERE
prior to a GROUP BY
. The WHERE
is used to limit the number of records.
Think of it this way, if you were applying the WHERE
after then you would return more records then you would want to group on. Applying it first, reduces the recordset then applies the grouping.
What does group by do exactly ?
GROUP BY enables summaries. Specifically, it controls the use of summary functions like COUNT(), SUM(), AVG(), MIN(), MAX() etc. There isn't much to summarize in your example.
But, suppose you had a Deptname column. Then you could issue this query and get the average salary by Deptname.
SELECT AVG(Salary) Average,
Deptname
FROM Employee
GROUP BY Deptname
ORDER BY Deptname
If you want your result set put in a certain order, use ORDER BY.
Why do we need GROUP BY with AGGREGATE FUNCTIONS?
It might be easier if you think of GROUP BY as "for each" for the sake of explanation. The query below:
SELECT empid, SUM (MonthlySalary)
FROM Employee
GROUP BY EmpID
is saying:
"Give me the sum of MonthlySalary's for each empid"
So if your table looked like this:
+-----+------------+
|empid|MontlySalary|
+-----+------------+
|1 |200 |
+-----+------------+
|2 |300 |
+-----+------------+
result:
+-+---+
|1|200|
+-+---+
|2|300|
+-+---+
Sum wouldn't appear to do anything because the sum of one number is that number. On the other hand if it looked like this:
+-----+------------+
|empid|MontlySalary|
+-----+------------+
|1 |200 |
+-----+------------+
|1 |300 |
+-----+------------+
|2 |300 |
+-----+------------+
result:
+-+---+
|1|500|
+-+---+
|2|300|
+-+---+
Then it would because there are two empid 1's to sum together. Not sure if this explanation helps or not, but I hope it makes things a little clearer.
Related Topics
What Is the Simplest SQL Query to Find the Second Largest Value
How to Simulate Unpivot in Access
Illegal Mix of Collations MySQL Error
How to Define a Composite Primary Key in Sql
How to Find Duplicate Values in a Table in Oracle
MySQL: View With Subquery in the from Clause Limitation
How to Delete from Multiple Tables Using Inner Join in SQL Server
How to Update Identity Column in SQL Server
MySQL Insert into Table Values.. VS Insert into Table Set
How to Create a Real One-To-One Relationship in SQL Server
Create a Pivot Table With Postgresql
Concatenate Multiple Result Rows of One Column into One, Group by Another Column
Can Table Columns With a Foreign Key Be Null
Can a Foreign Key Reference a Non-Unique Index
Select/Group by - Segments of Time (10 Seconds, 30 Seconds, etc)