Is there any difference between GROUP BY and DISTINCT
MusiGenesis' response is functionally the correct one with regard to your question as stated; the SQL Server is smart enough to realize that if you are using "Group By" and not using any aggregate functions, then what you actually mean is "Distinct" - and therefore it generates an execution plan as if you'd simply used "Distinct."
However, I think it's important to note Hank's response as well - cavalier treatment of "Group By" and "Distinct" could lead to some pernicious gotchas down the line if you're not careful. It's not entirely correct to say that this is "not a question about aggregates" because you're asking about the functional difference between two SQL query keywords, one of which is meant to be used with aggregates and one of which is not.
A hammer can work to drive in a screw sometimes, but if you've got a screwdriver handy, why bother?
(for the purposes of this analogy, Hammer : Screwdriver :: GroupBy : Distinct
and screw => get list of unique values in a table column
)
What is the difference between GROUP BY and DISTINCT?
DISTINCT
refer to distinct records as a whole, not distinct fields in the record.
Which is better: Distinct or Group By
In your example, both queries will generate the same execution plan so their performance will be the same.
However, they both have their own purpose. To make your code easier to understand, you should use distinct to eliminate duplicate rows and group by to apply aggregate operators (sum, count, max, ...).
What is difference between distinct and group by (without aggregate function)
GROUP BY lets you use aggregate functions, like AVG, MAX, MIN, SUM, and COUNT. Other hand DISTINCT just removes duplicates.
You can read this answer too : https://stackoverflow.com/a/164544/4227703
MySQL - What is the difference between GROUP BY and DISTINCT?
Duplicate of
Is there any difference between GROUP BY and DISTINCT
It is already discussed here
If still want to listen here
Well group by and distinct has its own use.
Distinct is used to filter unique records out of the records that satisfy the query criteria.
Group by clause is used to group the data upon which the aggregate functions are fired and the output is returned based on the columns in the group by clause. It has its own limitations such as all the columns that are in the select query apart from the aggregate functions have to be the part of the Group by clause.
So even though you can have the same data returned by distinct and group by clause its better to use distinct. See the below example
select col1,col2,col3,col4,col5,col6,col7,col8,col9 from table group by col1,col2,col3,col4,col5,col6,col7,col8,col9
can be written as
select distinct col1,col2,col3,col4,col5,col6,col7,col8,col9 from table
It makes you life easier when you have more columns in the select list. But at the same time if you need to display sum(col10) along with the above columns than you will have to use Group By. In that case distinct will not work.
eg
select col1,col2,col3,col4,col5,col6,col7,col8,col9,sum(col10) from table group by col1,col2,col3,col4,col5,col6,col7,col8,col9
Hope this helps.
What's faster, SELECT DISTINCT or GROUP BY in MySQL?
They are essentially equivalent to each other (in fact this is how some databases implement DISTINCT
under the hood).
If one of them is faster, it's going to be DISTINCT
. This is because, although the two are the same, a query optimizer would have to catch the fact that your GROUP BY
is not taking advantage of any group members, just their keys. DISTINCT
makes this explicit, so you can get away with a slightly dumber optimizer.
When in doubt, test!
Group By Vs Distinct in SQL
Why does this not work?
SELECT DISTINCT(continent), COUNT(name)
FROM world
WHERE population > 200000000;
That is simple. You have an aggregation query, because you have COUNT()
in the SELECT
. You have no GROUP BY
, so any other columns references in the SELECT
must be the arguments of aggregations columns. So, continent
generates an error.
You seem to also be under the impression that the parentheses around continent
have some significance. They do not. Not at all. SQL has a construct, SELECT DISTINCT
, which selects distinct values of rows.
Also note that DISTINCT
is almost never used with aggregation functions.
Does GROUP BY inherently imply DISTINCT?
You do not need the distinct
in this query. In general, you don't need distinct
with group by
. There are actually some queries where distinct
and group by
go together, but they are very rare.
You need group by
in this query, because you are using an aggregation function in the having
clause. So, use:
SELECT a_uuid
FROM table
GROUP BY a_uuid
HAVING NOT bool_or(type = 'Purchase')
DISTINCT with PARTITION BY vs. GROUPBY
Performance:
Winner: GROUP BY
Some very rudimentary testing on a large table with unindexed columns showed that at least in my case the two queries generated a completely different query plan. The one for PARTITION BY
was significantly slower.
The GROUP BY
query plan included only a table scan and aggregation operation while the PARTITION BY
plan had two nested loop self-joins. The PARTITION BY
took about 2800ms on the second run, the GROUP BY
took only 500ms.
Readability / Maintainability:
Winner: GROUP BY
Based on the opinions of the commenters here the PARTITION BY
is less readable for most developers so it will be probably also harder to maintain in the future.
Flexibility
Winner: PARTITION BY
PARTITION BY
gives you more flexibility in choosing the grouping columns. With GROUP BY
you can have only one set of grouping columns for all aggregated columns. With DISTINCT + PARTITION BY
you can have different column in each partition. Also on some DBMSs you can chose from more aggregation/analytic functions in the OVER
clause.
Related Topics
MySQL, Update Multiple Tables With One Query
How to Access the "Previous Row" Value in a Select Statement
Datetime2 VS Datetime in SQL Server
Tsql Pivot Without Aggregate Function
How to Get Column Names from a Table in SQL Server
Null in MySQL (Performance & Storage)
Why Isn't SQL Ansi-92 Standard Better Adopted Over Ansi-89
How to See the Raw SQL Queries Django Is Running
MySQL - How to Unpivot Columns to Rows
MySQL Insert Query Doesn't Work With Where Clause
Quick Selection of a Random Row from a Large Table in MySQL
Table Naming Dilemma: Singular Vs. Plural Names