Is There Any Difference Between Group by and Distinct

Is there any difference between GROUP BY and DISTINCT

MusiGenesis' response is functionally the correct one with regard to your question as stated; the SQL Server is smart enough to realize that if you are using "Group By" and not using any aggregate functions, then what you actually mean is "Distinct" - and therefore it generates an execution plan as if you'd simply used "Distinct."

However, I think it's important to note Hank's response as well - cavalier treatment of "Group By" and "Distinct" could lead to some pernicious gotchas down the line if you're not careful. It's not entirely correct to say that this is "not a question about aggregates" because you're asking about the functional difference between two SQL query keywords, one of which is meant to be used with aggregates and one of which is not.

A hammer can work to drive in a screw sometimes, but if you've got a screwdriver handy, why bother?

(for the purposes of this analogy, Hammer : Screwdriver :: GroupBy : Distinct and screw => get list of unique values in a table column)

What is the difference between GROUP BY and DISTINCT?

DISTINCT refer to distinct records as a whole, not distinct fields in the record.

Which is better: Distinct or Group By

In your example, both queries will generate the same execution plan so their performance will be the same.

However, they both have their own purpose. To make your code easier to understand, you should use distinct to eliminate duplicate rows and group by to apply aggregate operators (sum, count, max, ...).

What is difference between distinct and group by (without aggregate function)

GROUP BY lets you use aggregate functions, like AVG, MAX, MIN, SUM, and COUNT. Other hand DISTINCT just removes duplicates.

You can read this answer too : https://stackoverflow.com/a/164544/4227703

MySQL - What is the difference between GROUP BY and DISTINCT?

Duplicate of

Is there any difference between GROUP BY and DISTINCT

It is already discussed here

If still want to listen here

Well group by and distinct has its own use.

Distinct is used to filter unique records out of the records that satisfy the query criteria.

Group by clause is used to group the data upon which the aggregate functions are fired and the output is returned based on the columns in the group by clause. It has its own limitations such as all the columns that are in the select query apart from the aggregate functions have to be the part of the Group by clause.

So even though you can have the same data returned by distinct and group by clause its better to use distinct. See the below example

select col1,col2,col3,col4,col5,col6,col7,col8,col9 from table group by col1,col2,col3,col4,col5,col6,col7,col8,col9

can be written as

select distinct col1,col2,col3,col4,col5,col6,col7,col8,col9 from table

It makes you life easier when you have more columns in the select list. But at the same time if you need to display sum(col10) along with the above columns than you will have to use Group By. In that case distinct will not work.

eg

select col1,col2,col3,col4,col5,col6,col7,col8,col9,sum(col10) from table group by col1,col2,col3,col4,col5,col6,col7,col8,col9

Hope this helps.

What's faster, SELECT DISTINCT or GROUP BY in MySQL?

They are essentially equivalent to each other (in fact this is how some databases implement DISTINCT under the hood).

If one of them is faster, it's going to be DISTINCT. This is because, although the two are the same, a query optimizer would have to catch the fact that your GROUP BY is not taking advantage of any group members, just their keys. DISTINCT makes this explicit, so you can get away with a slightly dumber optimizer.

When in doubt, test!

Group By Vs Distinct in SQL

Why does this not work?

SELECT DISTINCT(continent), COUNT(name)
FROM world
WHERE population > 200000000;

That is simple. You have an aggregation query, because you have COUNT() in the SELECT. You have no GROUP BY, so any other columns references in the SELECT must be the arguments of aggregations columns. So, continent generates an error.

You seem to also be under the impression that the parentheses around continent have some significance. They do not. Not at all. SQL has a construct, SELECT DISTINCT, which selects distinct values of rows.

Also note that DISTINCT is almost never used with aggregation functions.

Does GROUP BY inherently imply DISTINCT?

You do not need the distinct in this query. In general, you don't need distinct with group by. There are actually some queries where distinct and group by go together, but they are very rare.

You need group by in this query, because you are using an aggregation function in the having clause. So, use:

SELECT a_uuid
FROM table
GROUP BY a_uuid
HAVING NOT bool_or(type = 'Purchase')

DISTINCT with PARTITION BY vs. GROUPBY

Performance:

Winner: GROUP BY

Some very rudimentary testing on a large table with unindexed columns showed that at least in my case the two queries generated a completely different query plan. The one for PARTITION BY was significantly slower.

The GROUP BY query plan included only a table scan and aggregation operation while the PARTITION BY plan had two nested loop self-joins. The PARTITION BY took about 2800ms on the second run, the GROUP BY took only 500ms.

Readability / Maintainability:

Winner: GROUP BY

Based on the opinions of the commenters here the PARTITION BY is less readable for most developers so it will be probably also harder to maintain in the future.

Flexibility

Winner: PARTITION BY

PARTITION BY gives you more flexibility in choosing the grouping columns. With GROUP BY you can have only one set of grouping columns for all aggregated columns. With DISTINCT + PARTITION BY you can have different column in each partition. Also on some DBMSs you can chose from more aggregation/analytic functions in the OVER clause.



Related Topics



Leave a reply



Submit