what is the difference between GROUP BY and ORDER BY in sql
ORDER BY alters the order in which items are returned.
GROUP BY will aggregate records by the specified columns which allows you to perform aggregation functions on non-grouped columns (such as SUM, COUNT, AVG, etc).
What is the difference between 'GROUP BY' and 'ORDER BY' in SQL?
No you can't change the order of SQL clauses, it should be in following form:
SELECT <attribute and function list>
FROM <table list>
[ WHERE <condition> ]
[ GROUP BY <grouping attribute(s)> ]
[ HAVING <group condition> ]
[ ORDER BY <attribute list> ];
Retrieval queries in SQL consist of these (up to) six clauses, but only the first two — SELECT
and FROM
— are mandatory.
GROUP BY
specifies grouping attributes (doesn't guarantee in which order the result appears), whereas ORDER BY
specifies an order for displaying the result of a query.
comment:
QUERY = "select * from animals where species = 'orangutan' order by name ;" here when i replace order with group i didn't saw any difference
Standard SQL doesn't guarantee the order in which result should be appear and a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause.
so:
select *
from animals
where species = 'orangutan'
group by name;
This query is illegal in standard SQL because it includes every fields of table. check this MySQL doc MySQL Handling of GROUP BY.
whereas,
select *
from animals
where species = 'orangutan'
order by name;
is perfectly valid.
Anyways, some implementation of SQL like MySQL also sorts the output result, check:
12.17.3 MySQL Handling of GROUP BY
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However.....
How does GROUP BY orders?
Other Considerations When using ROLLUP
GROUP BY in MySQL sorts results, and you can use explicit ASC and DESC keywords with columns named in the GROUP BY list to specify sort order for individual columns.
Similar, behavior is also found in your DBMS.
If you can tag name of your DBMS then I or someone else can simulate difference between with some example. Below I am giving an example in MySQL:
12.17.3 MySQL Handling of GROUP BY
.... However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.
I have a Employee table, like below:
mysql> select * FROM Employee;
+-----+------+-------------+------+
| SSN | Name | Designation | MSSN |
+-----+------+-------------+------+
| 1 | A | OWNER | 1 |
| 10 | G | WORKER | 5 |
| 11 | D | WORKER | 4 |
| 12 | E | WORKER | 4 |
| 2 | B | BOSS | 1 |
| 3 | F | BOSS | 1 |
| 4 | C | BOSS | 2 |
| 5 | H | BOSS | 2 |
| 6 | L | WORKER | 2 |
| 7 | I | BOSS | 2 |
| 8 | K | WORKER | 3 |
| 9 | J | WORKER | 7 |
+-----+------+-------------+------+
12 rows in set (0.00 sec)
If I apply group by 'Designation', it make groups and order result
mysql> select * from Employee group by Designation;
+-----+------+-------------+------+
| SSN | Name | Designation | MSSN |
+-----+------+-------------+------+
| 2 | B | BOSS | 1 |
| 1 | A | OWNER | 1 |
| 10 | G | WORKER | 5 |
+-----+------+-------------+------+
3 rows in set (0.00 sec)
Three row for three types of 'Designation' values present in table, which one picked is indeterminate according to MySQL doc.
and order by query's result is quite different:
mysql> select * from Employee order by Designation;
+-----+------+-------------+------+
| SSN | Name | Designation | MSSN |
+-----+------+-------------+------+
| 4 | C | BOSS | 2 |
| 7 | I | BOSS | 2 |
| 5 | H | BOSS | 2 |
| 2 | B | BOSS | 1 |
| 3 | F | BOSS | 1 |
| 1 | A | OWNER | 1 |
| 12 | E | WORKER | 4 |
| 11 | D | WORKER | 4 |
| 6 | L | WORKER | 2 |
| 10 | G | WORKER | 5 |
| 8 | K | WORKER | 3 |
| 9 | J | WORKER | 7 |
+-----+------+-------------+------+
12 rows in set (0.00 sec)
SQL Difference between two rows with group by
I think this is aggregation with lag()
:
select time, device, max(value) as max_value,
(max(value) - lag(max(value)) over (partition by device order by time)) as diff
from rawdata
group by time, device
order by time desc;
Here is a db<>fiddle.
Is there any difference between GROUP BY and DISTINCT
MusiGenesis' response is functionally the correct one with regard to your question as stated; the SQL Server is smart enough to realize that if you are using "Group By" and not using any aggregate functions, then what you actually mean is "Distinct" - and therefore it generates an execution plan as if you'd simply used "Distinct."
However, I think it's important to note Hank's response as well - cavalier treatment of "Group By" and "Distinct" could lead to some pernicious gotchas down the line if you're not careful. It's not entirely correct to say that this is "not a question about aggregates" because you're asking about the functional difference between two SQL query keywords, one of which is meant to be used with aggregates and one of which is not.
A hammer can work to drive in a screw sometimes, but if you've got a screwdriver handy, why bother?
(for the purposes of this analogy, Hammer : Screwdriver :: GroupBy : Distinct
and screw => get list of unique values in a table column
)
Does group by automatically guarantee order by?
group by
does not order the data neccessarily. A DB is designed to grab the data as fast as possible and only sort if necessary.
So add the order by
if you need a guaranteed order.
Calculate row difference within groups
Use LAG(<column>)
analytic function to obtain a "previous" column value specified within the OVER
part, then substract current value from it and make it a positive number multiplying it by -1
. If previous value isn't present (is null) then take the current value.
Pseudo code would be:
If previous_order_value exists:
-1 * (previous_order_value - current_order_value)
Else
current_order_value
where previous_order_value is based on the same id & school_id and is sorted by enrollment_start_date in ascending order
SQL Code:
select
id,
school_id,
enrollment_start_date,
[order],
coalesce(-1 * (lag([order]) over (partition by id, school_id order by enrollment_start_date ) - [order]), [order]) as diff
from yourtable
Also note, that order
keyword is reserved in SQL Server, which is why your column was created with name wrapped within [ ]
. I suggest using some other word for this column, if possible.
Difference between HAVING and WHERE in SQL
The simple way to think about it is to consider the order in which the steps are applied.
Step 1: Where clause filters data
Step 2: Group by is implemented (SUM / MAX / MIN / ETC)
Step 3: Having clause filters the results
So in your 2 examples:
SELECT agentId, SUM(quantity) total_sales
FROM sales s, houses h
WHERE s.houseId = h.houseId AND h.type = "condo"
GROUP BY agentId
ORDER BY total_sales;
Step 1: Filter by HouseId and Condo
Step 2: Add up the results
(number of houses that match the houseid and condo)
SELECT agentId, SUM(quantity) total_sales
FROM sales s, houses h
GROUP BY agentId
HAVING s.houseId = h.houseId AND h.type = "condo"
ORDER BY total_sales;
Step 1: No Filter
Step 2: Add up quantity of all houses
Step 3: Filter the results by houseid and condo.
Hopefully this clears up what is happening.
The easiest way to decide which you should use is:
- Use WHERE to filter the data
- Use HAVING to filter the results of an aggregation (SUM / MAX / MIN / ETC)
SQL to find time difference between two values in a group
For a fixed list of states, you can use window functions and aggregation:
select item,
sum(case when lag_state = 'st1' and state = 'st2' then timestampdiff(second, lag_time, time)) as st1_to_st2,
sum(case when lag_state = 'st2' and state = 'st3' then timestampdiff(second, lag_time, time)) as st2_to_st3
from (
select t.*,
lag(time) over(partition by item order by state) lag_time,
lag(state) over(partition by item order by state) lag_state
from mytable t
) t
group by item
If you want something more generic - ie that does not harcode the states - I would recommend putting the values in rows rather than in columns:
select item, lag_state, state,
sum(timestampdiff(second, lag_time, time)) as sum_diff
from (
select t.*,
lag(time) over(partition by item order by state) lag_time,
lag(state) over(partition by item order by state) lag_state
from mytable t
) t
group by item, lag_state, state
SQL: difference between PARTITION BY and GROUP BY
They're used in different places. GROUP BY
modifies the entire query, like:
select customerId, count(*) as orderCount
from Orders
group by customerId
But PARTITION BY
just works on a window function, like ROW_NUMBER()
:
select row_number() over (partition by customerId order by orderId)
as OrderNumberForThisCustomer
from Orders
GROUP BY
normally reduces the number of rows returned by rolling
them up and calculating averages or sums for each row.PARTITION BY
does not affect the number of rows returned, but it
changes how a window function's result is calculated.
Related Topics
Which Database Design Gives Better Performance
Inner Join in Update SQL for Db2
Why Postgres Returns Unordered Data in Select Query, After Updation of Row
How to Fill Date Gaps in MySQL
Why Using a Udf in a SQL Query Leads to Cartesian Product
Parameterise Table Name in .Net/Sql
I Keep Getting the Error "Relation [Table] Does Not Exist"
Split One Column Value into Multiple Column Values
Procedure or Function !!! Has Too Many Arguments Specified
Automate Version Number Retrieval from .Dtsx Files
Query a Database Based on Result of Query from Another Database
Autoincrement in Oracle to Already Created Table
How to Anticipate and Escape Single Quote ' in Oracle
SQL Query: Simulating an "And" Over Several Rows Instead of Sub-Querying
How to Delete a Fixed Number of Rows with Sorting in Postgresql