Having Without Group By

Use of HAVING without GROUP BY not working as expected

The documentation is correct; i.e. you could run this statement:

select sum(wage) sum_of_all_wages
, count(1) count_of_all_records
from [dbo].[_abc]
having sum(wage) > 5

The reason your statement doesn't work is because of the select *, which means select every columns' value. When there is no group by, all records are aggregated; i.e. you only get 1 record in your result set which has to represent every record. As such, you can only* include values provided by applying aggregate functions to your columns; not the columns themselves.
* of course, you can also provide constants, so select 'x' constant, count(1) cnt from myTable would work.

There aren't many use cases I can think of where you'd want to use having without a group by, but certainly it can be done as shown above.

NB: If you wanted all rows where the wage was greater than 5, you'd use the where clause instead:

select * 
from [dbo].[_abc]
where wage > 5

Equally, if you want the sum of all wages greater than 5 you can do this

select sum(wage) sum_of_wage_over_5 
from [dbo].[_abc]
where wage > 5

Or if you wanted to compare the sum of wages over 5 with those under:

select case when wage > 5 then 1 else 0 end wage_over_five
, sum(wage) sum_of_wage
from [dbo].[_abc]
group by case when wage > 5 then 1 else 0 end

See runnable examples here.


Update based on comments:

Do you need having to use aggregate functions?

No. You can run select sum(wage) from [dbo].[_abc]. When an aggregate function is used without a group by clause, it's as if you're grouping by a constant; i.e. select sum(wage) from [dbo].[_abc] group by 1.

The documentation merely means that whilst normally you'd have a having statement with a group by statement, it's OK to exclude the group by / in such cases the having statement, like the select statement, will treat your query as if you'd specified group by 1

What's the point?

It's hard to think of many good use cases, since you're only getting one row back and the having statement is a filter on that.

One use case could be that you write code to monitor your licenses for some software; if you have less users than per-user-licenses all's good / you don't want to see the result since you don't care. If you have more users you want to know about it. E.g.

declare @totalUserLicenses int = 100
select count(1) NumberOfActiveUsers
, @totalUserLicenses NumberOfLicenses
, count(1) - @totalUserLicenses NumberOfAdditionalLicensesToPurchase
from [dbo].[Users]
where enabled = 1
having count(1) > @totalUserLicenses

Isn't the select irrelevant to the having clause?

Yes and no. Having is a filter on your aggregated data. Select says what columns/information to bring back. As such you have to ask "what would the result look like?" i.e. Given we've had to effectively apply group by 1 to make use of the having statement, how should SQL interpret select *? Since your table only has one column this would translate to select wage; but we have 5 rows, so 5 different values of wage, and only 1 row in the result to show this.

I guess you could say "I want to return all rows if their sum is greater than 5; otherwise I don't want to return any rows". Were that your requirement it could be achieved a variety of ways; one of which would be:

select *
from [dbo].[_abc]
where exists
(
select 1
from [dbo].[_abc]
having sum(wage) > 5
)

However, we have to write the code to meet the requirement, rather than expect the code to understand our intent.

Another way to think about having is as being a where statement applied to a subquery. I.e. your original statement effectively reads:

select wage
from
(
select sum(wage) sum_of_wage
from [dbo].[_abc]
group by 1
) singleRowResult
where sum_of_wage > 5

That won't run because wage is not available to the outer query; only sum_of_wage is returned.

Having Without Group By in MySQL

Don't use having without group by. Although MySQL supports that, this is not valid standard SQL, and the behavior you get will most likely be counter-intuitive.

The first query should be just a where clause:

select wage from abc where wage > 1

The second query just makes no sense: you have both an aggregated and a non-aggregated wage in the having clause. If you want the row that has the maximum wage, then you can order by and limit:

select wage
from abc
order by wage desc limit 1

Or if you want to allow ties, use a correlated subquery:

select *
from abc a
where wage = (select max(a1.wage) from abc)

Spark SQL HAVING clause without Group/Aggregate

In some databases / engines, when the GROUP BY is not used in conjunction with HAVING, HAVING defaults to a WHERE clause.

Normally the WHERE clause is used.

I would not rely on HAVING without a GROUP BY.

SQL - using 'HAVING' with 'EXISTS' without using 'GROUP BY'

well the error is clear in first query UnitPrice is not part of aggregation nor group by
whereas in your second query you are comparing p.unitprice from table "products p" which doesn't need to be part of aggregation or group by , your second query is equivalent to :

select * from products p
where p.unitprice > (select avg(unitprice) FROM products)

which maybe this is more clear , that sql caculate avg(unitprice) then compares it with unitprice column from product.

HAVING clause without GROUP BY in Oracle database using developer desktop and developer web

This is a great question AND puzzle!

Oracle SQL Developer Web is provided via Oracle REST Data Services (ORDS). There is a RESTful Web Service used to execute 'ad hoc' SQL statements and scripts.

Instead of bringing back all the rows from a query in a single call, we page them. And instead of holding a resultset open and process running, we stick to the RESTful way, and do all the work on a single call and response.

How do we make this happen?

Well, when you type in that query from your question and execute it, on the back end, that's not actually what gets executed.

We wrap that query with another SELECT, and use the ROW_NUMBER() OVER analytic function call. This allows us to 'window' the query results, in this case between rows 1 and 26, or the the first 25 rows of that query, your query.

SELECT *
FROM (
SELECT Q_.*,
ROW_NUMBER() OVER(
ORDER BY 1
) RN___
FROM (
select *
from sh.sales
where amount_sold > 1000
having amount_sold < 2000
) Q_
)
WHERE RN___ BETWEEN :1 AND :2

Ok, but so what?

Well, Optimizer figures out this query can still run, even if the having clause isn't appropriate.

The optimizer is always free to re-arrange a query before searching for best execution plans.

In this case, a 10053 trace shows that a query such as below that came from SQL Dev Web (I'm using EMP but the same applies for any table)

SELECT *
FROM (
SELECT Q_.*,
ROW_NUMBER() OVER(
ORDER BY 1
) RN___
FROM (
SELECT *
FROM emp
WHERE sal > 1000
HAVING sal < 2000
) Q_
)
WHERE RN___ BETWEEN :1 AND :2

got internally transformed to the following before being optimized for plans.

SELECT 
subq.EMPNO EMPNO,
subq.ENAME ENAME,
subq.JOB JOB,
subq.MGR MGR,
subq.HIREDATE HIREDATE,
subq.SAL SAL,subq.COMM COMM,
subq.DEPTNO DEPTNO,
subq.RN___ RN___
FROM
(SELECT
EMP.EMPNO EMPNO,
EMP.ENAME ENAME,
EMP.JOB JOB,EMP.MGR MGR,
EMP.HIREDATE HIREDATE,
EMP.SAL SAL,
EMP.COMM COMM,
EMP.DEPTNO DEPTNO,
ROW_NUMBER() OVER ( ORDER BY NULL ) RN___
FROM EMP EMP
WHERE EMP.SAL>1000 AND TO_NUMBER(:B1)>=TO_NUMBER(:B2)
) subq
WHERE subq.RN___>=TO_NUMBER(:B3)
AND subq.RN___<=TO_NUMBER(:B4)

Notice the HAVING has been transformed/optimized out of the query, which lets it pass through onto the execution phase.

Major to @connor-mcdonald of AskTom fame for helping me parse this out.

And so that's why it works in SQL Developer Web, but NOT in SQL Developer Desktop, where the query is executed exactly as written.

Sample Image

What does this SQL query with MIN() in HAVING clause do exactly?

When you use MAX (or other aggregate functions) in the select columns or the having clause, you cause an implicit GROUP BY () (that is, all rows are grouped together into a single result row).

And when grouping (whether all rows or with a specific GROUP BY), if you specify a column outside of an aggregate function (such as your dob =) that is not one of the things being aggregated on or something functionally dependent on it (for example, some other column in a table when you are grouping by the primary key for that table), one of two things will happen:

If you have enabled the ONLY_FULL_GROUP_BY sql_mode (which is the default in newer versions), you will receive an error:

In aggregated query without GROUP BY, expression ... contains nonaggregated column '...'; this is incompatible with sql_mode=only_full_group_by

If you have not enabled ONLY_FULL_GROUP_BY, a value from some arbitrary one of the grouped rows will be used. So it is possible your dob = MIN(dob) will be true (and it will definitely be true if all rows have the same dob), but you can't rely on it doing anything useful and should avoid doing this.



Related Topics



Leave a reply



Submit