Differencebetween Group by and Order by in SQL

what is the difference between GROUP BY and ORDER BY in sql

ORDER BY alters the order in which items are returned.

GROUP BY will aggregate records by the specified columns which allows you to perform aggregation functions on non-grouped columns (such as SUM, COUNT, AVG, etc).

What is the difference between 'GROUP BY' and 'ORDER BY' in SQL?

No you can't change the order of SQL clauses, it should be in following form:

SELECT <attribute and function list>
FROM <table list>
[ WHERE <condition> ]
[ GROUP BY <grouping attribute(s)> ]
[ HAVING <group condition> ]
[ ORDER BY <attribute list> ];

Retrieval queries in SQL consist of these (up to) six clauses, but only the first two — SELECT and FROM — are mandatory.

GROUP BY specifies grouping attributes (doesn't guarantee in which order the result appears), whereas ORDER BY specifies an order for displaying the result of a query.

comment:

QUERY = "select * from animals where species = 'orangutan' order by name ;" here when i replace order with group i didn't saw any difference

Standard SQL doesn't guarantee the order in which result should be appear and a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause.

so:

select * 
from animals
where species = 'orangutan'
group by name;

This query is illegal in standard SQL because it includes every fields of table. check this MySQL doc MySQL Handling of GROUP BY.

whereas,

select * 
from animals
where species = 'orangutan'
order by name;

is perfectly valid.

Anyways, some implementation of SQL like MySQL also sorts the output result, check:

12.17.3 MySQL Handling of GROUP BY

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However.....

How does GROUP BY orders?

Other Considerations When using ROLLUP

GROUP BY in MySQL sorts results, and you can use explicit ASC and DESC keywords with columns named in the GROUP BY list to specify sort order for individual columns.

Similar, behavior is also found in your DBMS.

If you can tag name of your DBMS then I or someone else can simulate difference between with some example. Below I am giving an example in MySQL:

12.17.3 MySQL Handling of GROUP BY

.... However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.

I have a Employee table, like below:

mysql> select * FROM Employee;
+-----+------+-------------+------+
| SSN | Name | Designation | MSSN |
+-----+------+-------------+------+
| 1 | A | OWNER | 1 |
| 10 | G | WORKER | 5 |
| 11 | D | WORKER | 4 |
| 12 | E | WORKER | 4 |
| 2 | B | BOSS | 1 |
| 3 | F | BOSS | 1 |
| 4 | C | BOSS | 2 |
| 5 | H | BOSS | 2 |
| 6 | L | WORKER | 2 |
| 7 | I | BOSS | 2 |
| 8 | K | WORKER | 3 |
| 9 | J | WORKER | 7 |
+-----+------+-------------+------+
12 rows in set (0.00 sec)

If I apply group by 'Designation', it make groups and order result

mysql> select * from Employee group by Designation;
+-----+------+-------------+------+
| SSN | Name | Designation | MSSN |
+-----+------+-------------+------+
| 2 | B | BOSS | 1 |
| 1 | A | OWNER | 1 |
| 10 | G | WORKER | 5 |
+-----+------+-------------+------+
3 rows in set (0.00 sec)

Three row for three types of 'Designation' values present in table, which one picked is indeterminate according to MySQL doc.

and order by query's result is quite different:

mysql> select * from Employee order by Designation;
+-----+------+-------------+------+
| SSN | Name | Designation | MSSN |
+-----+------+-------------+------+
| 4 | C | BOSS | 2 |
| 7 | I | BOSS | 2 |
| 5 | H | BOSS | 2 |
| 2 | B | BOSS | 1 |
| 3 | F | BOSS | 1 |
| 1 | A | OWNER | 1 |
| 12 | E | WORKER | 4 |
| 11 | D | WORKER | 4 |
| 6 | L | WORKER | 2 |
| 10 | G | WORKER | 5 |
| 8 | K | WORKER | 3 |
| 9 | J | WORKER | 7 |
+-----+------+-------------+------+
12 rows in set (0.00 sec)

SQL Difference between two rows with group by

I think this is aggregation with lag():

select time, device, max(value) as max_value,
(max(value) - lag(max(value)) over (partition by device order by time)) as diff
from rawdata
group by time, device
order by time desc;

Here is a db<>fiddle.

Is there any difference between GROUP BY and DISTINCT

MusiGenesis' response is functionally the correct one with regard to your question as stated; the SQL Server is smart enough to realize that if you are using "Group By" and not using any aggregate functions, then what you actually mean is "Distinct" - and therefore it generates an execution plan as if you'd simply used "Distinct."

However, I think it's important to note Hank's response as well - cavalier treatment of "Group By" and "Distinct" could lead to some pernicious gotchas down the line if you're not careful. It's not entirely correct to say that this is "not a question about aggregates" because you're asking about the functional difference between two SQL query keywords, one of which is meant to be used with aggregates and one of which is not.

A hammer can work to drive in a screw sometimes, but if you've got a screwdriver handy, why bother?

(for the purposes of this analogy, Hammer : Screwdriver :: GroupBy : Distinct and screw => get list of unique values in a table column)

Does group by automatically guarantee order by?

group by does not order the data neccessarily. A DB is designed to grab the data as fast as possible and only sort if necessary.

So add the order by if you need a guaranteed order.

Calculate row difference within groups

Use LAG(<column>) analytic function to obtain a "previous" column value specified within the OVER part, then substract current value from it and make it a positive number multiplying it by -1. If previous value isn't present (is null) then take the current value.

Pseudo code would be:

If previous_order_value exists:
-1 * (previous_order_value - current_order_value)
Else
current_order_value

where previous_order_value is based on the same id & school_id and is sorted by enrollment_start_date in ascending order

SQL Code:

select
id,
school_id,
enrollment_start_date,
[order],
coalesce(-1 * (lag([order]) over (partition by id, school_id order by enrollment_start_date ) - [order]), [order]) as diff
from yourtable

Also note, that order keyword is reserved in SQL Server, which is why your column was created with name wrapped within [ ]. I suggest using some other word for this column, if possible.

Difference between HAVING and WHERE in SQL

The simple way to think about it is to consider the order in which the steps are applied.

Step 1: Where clause filters data

Step 2: Group by is implemented (SUM / MAX / MIN / ETC)

Step 3: Having clause filters the results

So in your 2 examples:

SELECT agentId, SUM(quantity) total_sales 
FROM sales s, houses h
WHERE s.houseId = h.houseId AND h.type = "condo"
GROUP BY agentId
ORDER BY total_sales;

Step 1: Filter by HouseId and Condo

Step 2: Add up the results
(number of houses that match the houseid and condo)

SELECT agentId, SUM(quantity) total_sales 
FROM sales s, houses h
GROUP BY agentId
HAVING s.houseId = h.houseId AND h.type = "condo"
ORDER BY total_sales;

Step 1: No Filter

Step 2: Add up quantity of all houses

Step 3: Filter the results by houseid and condo.

Hopefully this clears up what is happening.

The easiest way to decide which you should use is:
- Use WHERE to filter the data
- Use HAVING to filter the results of an aggregation (SUM / MAX / MIN / ETC)

SQL to find time difference between two values in a group

For a fixed list of states, you can use window functions and aggregation:

select item,
sum(case when lag_state = 'st1' and state = 'st2' then timestampdiff(second, lag_time, time)) as st1_to_st2,
sum(case when lag_state = 'st2' and state = 'st3' then timestampdiff(second, lag_time, time)) as st2_to_st3
from (
select t.*,
lag(time) over(partition by item order by state) lag_time,
lag(state) over(partition by item order by state) lag_state
from mytable t
) t
group by item

If you want something more generic - ie that does not harcode the states - I would recommend putting the values in rows rather than in columns:

select item, lag_state, state, 
sum(timestampdiff(second, lag_time, time)) as sum_diff
from (
select t.*,
lag(time) over(partition by item order by state) lag_time,
lag(state) over(partition by item order by state) lag_state
from mytable t
) t
group by item, lag_state, state

SQL: difference between PARTITION BY and GROUP BY

They're used in different places. GROUP BY modifies the entire query, like:

select customerId, count(*) as orderCount
from Orders
group by customerId

But PARTITION BY just works on a window function, like ROW_NUMBER():

select row_number() over (partition by customerId order by orderId)
as OrderNumberForThisCustomer
from Orders
  • GROUP BY normally reduces the number of rows returned by rolling
    them up and calculating averages or sums for each row.
  • PARTITION BY does not affect the number of rows returned, but it
    changes how a window function's result is calculated.


Related Topics



Leave a reply



Submit