Any reason for GROUP BY clause without aggregation function?
is the GROUP BY statement in any way useful without an accompanying aggregate function?
Using DISTINCT
would be a synonym in such a situation, but the reason you'd want/have to define a GROUP BY
clause would be in order to be able to define HAVING
clause details.
If you need to define a HAVING
clause, you have to define a GROUP BY
- you can't do it in conjunction with DISTINCT
.
PostgreSQL group by without aggregate function. Why does it work?
This is covered, but not especially obvious, in the docs:
When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.
In this case, I'm guessing that id
is the primary key of the table user
which would make name
functionally dependent on id
.
Why does MySQL allow group by queries WITHOUT aggregate functions?
I believe that it was to handle the case where grouping by one field would imply other fields are also being grouped:
SELECT user.id, user.name, COUNT(post.*) AS posts
FROM user
LEFT OUTER JOIN post ON post.owner_id=user.id
GROUP BY user.id
In this case the user.name will always be unique per user.id, so there is convenience in not requiring the user.name in the GROUP BY
clause (although, as you say, there is definite scope for problems)
GROUP BY without aggregate function in SparkSQL
You can use Window
function - row_number()
.
val columns = input.columns.map(col(_))
input.withColumn("rn", row_number().over(Window.partitionBy(columns: _*).orderBy(columns: _*)))
.where("rn = 1")
.drop("rn")
.show()
Why columns in selection without aggregate function needs to be part of Group by clause in MySQL?
When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions (like sum, max, min etc which would return single value for each group), since there would otherwise be more than one possible value to return for an ungrouped column and select won't just return you an arbitrary value.
However, there are multiple workarounds to this.
Option 1. Which you did yourself, adding the other column in group by as -
SELECT
matchid
, mdate
, COUNT(player)
FROM game
JOIN goal
ON id = matchid
WHERE (team1= 'POL' OR team2= 'POL')
GROUP BY matchid, mdate;
Option 2. Also, what you could do in this instance is to add aggregate function on the other column as below (since the field mdate is functionally dependent on match id hence you can do that. You can use any aggregate function which would pick a value)
SELECT
matchid
, max(mdate) as mdate
, COUNT(player)
FROM game
JOIN goal
ON id = matchid
WHERE (team1= 'POL' OR team2= 'POL')
GROUP BY matchid;
Option 3. You can calculate the aggregate in a sub-query and then join it with itself to get the additional columns you'd need to show as below
select
t1.matchid
, t2.mdate
, t1.count_player
from
(SELECT
matchid
, COUNT(player) as count_player
FROM game
JOIN goal
ON id = matchid
WHERE (team1= 'POL' OR team2= 'POL')
GROUP BY matchid) t1
join game t2 on t1.matchid = t2.id;
Option 4. You can also use window function and get the distinct tuple value
SELECT distinct
matchid
, mdate
, COUNT(player) over(partition by matchid) as
count_player
FROM game
JOIN goal
ON id = matchid
WHERE (team1= 'POL' OR team2= 'POL');
What is difference between distinct and group by (without aggregate function)
GROUP BY lets you use aggregate functions, like AVG, MAX, MIN, SUM, and COUNT. Other hand DISTINCT just removes duplicates.
You can read this answer too : https://stackoverflow.com/a/164544/4227703
Related Topics
Log Record Changes in SQL Server in an Audit Table
SQL (Oracle): Order by and Limit
Dynamic Sorting Within SQL Stored Procedures
How to Convert a SQL Server 2008 Datetimeoffset to a Datetime
Dynamic Oracle Pivot_In_Clause
Should I Design a Table with a Primary Key of Varchar or Int
Sql:In Clause in Stored Procedure:How to Pass Values
Oracle SQL - Identify Sequential Value Ranges
Partition Function Count() Over Possible Using Distinct
Linq Version of SQL "In" Statement
How to Calculate a Running Total in SQL Without Using a Cursor
Permanently Set Postgresql Schema Path
What Does Delimiter // Do in a Trigger