How Does Group by Work

How does GROUP BY work?

GROUP BY returns a single row for each unique combination of the GROUP BY fields. So in your example, every distinct combination of (a1, a2) occurring in rows of Tab1 results in a row in the query representing the group of rows with the given combination of group by field values . Aggregate functions like SUM() are computed over the members of each group.

Using group by on multiple columns

Group By X means put all those with the same value for X in the one group.

Group By X, Y means put all those with the same values for both X and Y in the one group.

To illustrate using an example, let's say we have the following table, to do with who is attending what subject at a university:

Table: Subject_Selection

+---------+----------+----------+
| Subject | Semester | Attendee |
+---------+----------+----------+
| ITB001  |        1 | John     |
| ITB001  |        1 | Bob      |
| ITB001  |        1 | Mickey   |
| ITB001  |        2 | Jenny    |
| ITB001  |        2 | James    |
| MKB114  |        1 | John     |
| MKB114  |        1 | Erica    |
+---------+----------+----------+

When you use a group by on the subject column only; say:

select Subject, Count(*)
from Subject_Selection
group by Subject

You will get something like:

+---------+-------+
| Subject | Count |
+---------+-------+
| ITB001  |     5 |
| MKB114  |     2 |
+---------+-------+

...because there are 5 entries for ITB001, and 2 for MKB114

If we were to group by two columns:

select Subject, Semester, Count(*)
from Subject_Selection
group by Subject, Semester

we would get this:

+---------+----------+-------+
| Subject | Semester | Count |
+---------+----------+-------+
| ITB001  |        1 |     3 |
| ITB001  |        2 |     2 |
| MKB114  |        1 |     2 |
+---------+----------+-------+

This is because, when we group by two columns, it is saying "Group them so that all of those with the same Subject and Semester are in the same group, and then calculate all the aggregate functions (Count, Sum, Average, etc.) for each of those groups". In this example, this is demonstrated by the fact that, when we count them, there are three people doing ITB001 in semester 1, and two doing it in semester 2. Both of the people doing MKB114 are in semester 1, so there is no row for semester 2 (no data fits into the group "MKB114, Semester 2")

Hopefully that makes sense.

How does GroupBy in LINQ work?

Group by works by taking whatever you are grouping and putting it into a collection of items that match the key you specify in your group by clause.

If you have the following data:

Member name     Group code
Betty           123
Mildred         123
Charli          456
Mattilda        456

And the following query

var query = from m in members
            group m by m.GroupCode into membersByGroupCode
            select membersByGroupCode;

The group by will return the following results:

Sample Image

You wouldn’t typically want to just select the grouping directly. What if we just want the group code and the member names without all of the other superfluous data?

We just need to perform a select to get the data that we are after:

var query = from m in members
            group m by m.GroupCode into membersByGroupCode
            let memberNames = from m2 in membersByGroupCode
                              select m2.Name
            select new
            {
                GroupCode = membersByGroupCode.Key,
                MemberNames = memberNames
            };

Which returns the following results:

Sample Image

How does group by work in sub queries?

Your derived table is missing an alias.

SELECT SUM(Mean) Total Mean, Number 
FROM (SELECT Name, avg(Value) Mean, Number
    FROM Table1
    WHERE Category = 'Time'
    GROUP BY Name, Number) t --alias for the derived table
GROUP BY Number;

Understanding how WHERE works with GROUP BY and Aggregation

You have the order wrong. The WHERE clause goes before the GROUP BY:

select cu.CustomerID,cu.FirstName,cu.LastName, COUNT(si.InvoiceID)as inv 
from Customer as cu 
inner join SalesInvoice as si 
   on cu.CustomerID = si.CustomerID 
where cu.FirstName = 'mark' 
group by cu.CustomerID,cu.FirstName,cu.LastName

If you want to perform a filter after the GROUP BY, then you will use a HAVING clause:

select cu.CustomerID,cu.FirstName,cu.LastName, COUNT(si.InvoiceID)as inv 
from Customer as cu 
inner join SalesInvoice as si 
   on cu.CustomerID = si.CustomerID 
group by cu.CustomerID,cu.FirstName,cu.LastName
having cu.FirstName = 'mark'

A HAVING clause is typically used for aggregate function filtering, so it makes sense that this would be applied after the GROUP BY

To learn about the order of operations here is article explaining the order. From the article the order of operation in SQL is:

To start out, I thought it would be good to look up the order in which SQL directives get executed as this will change the way I can optimize:

FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause

Using this order you will apply the filter in the WHERE prior to a GROUP BY. The WHERE is used to limit the number of records.

Think of it this way, if you were applying the WHERE after then you would return more records then you would want to group on. Applying it first, reduces the recordset then applies the grouping.

What does group by do exactly ?

GROUP BY enables summaries. Specifically, it controls the use of summary functions like COUNT(), SUM(), AVG(), MIN(), MAX() etc. There isn't much to summarize in your example.

But, suppose you had a Deptname column. Then you could issue this query and get the average salary by Deptname.

SELECT AVG(Salary) Average,
       Deptname
  FROM Employee
 GROUP BY Deptname
 ORDER BY Deptname

If you want your result set put in a certain order, use ORDER BY.

Why do we need GROUP BY with AGGREGATE FUNCTIONS?

It might be easier if you think of GROUP BY as "for each" for the sake of explanation. The query below:

SELECT empid, SUM (MonthlySalary) 
FROM Employee
GROUP BY EmpID

is saying:

"Give me the sum of MonthlySalary's for each empid"

So if your table looked like this:

+-----+------------+
|empid|MontlySalary|
+-----+------------+
|1    |200         |
+-----+------------+
|2    |300         |
+-----+------------+

result:

+-+---+
|1|200|
+-+---+
|2|300|
+-+---+

Sum wouldn't appear to do anything because the sum of one number is that number. On the other hand if it looked like this:

+-----+------------+
|empid|MontlySalary|
+-----+------------+
|1    |200         |
+-----+------------+
|1    |300         |
+-----+------------+
|2    |300         |
+-----+------------+