SQL - HAVING vs. WHERE
WHERE
clause introduces a condition on individual rows; HAVING
clause introduces a condition on aggregations, i.e. results of selection where a single result, such as count, average, min, max, or sum, has been produced from multiple rows. Your query calls for a second kind of condition (i.e. a condition on an aggregation) hence HAVING
works correctly.
As a rule of thumb, use WHERE
before GROUP BY
and HAVING
after GROUP BY
. It is a rather primitive rule, but it is useful in more than 90% of the cases.
While you're at it, you may want to re-write your query using ANSI version of the join:
SELECT L.LectID, Fname, Lname
FROM Lecturers L
JOIN Lecturers_Specialization S ON L.LectID=S.LectID
GROUP BY L.LectID, Fname, Lname
HAVING COUNT(S.Expertise)>=ALL
(SELECT COUNT(Expertise) FROM Lecturers_Specialization GROUP BY LectID)
This would eliminate WHERE
that was used as a theta join condition.
WHERE vs HAVING
Why is it that you need to place columns you create yourself (for example "select 1 as number") after HAVING and not WHERE in MySQL?
WHERE
is applied before GROUP BY
, HAVING
is applied after (and can filter on aggregates).
In general, you can reference aliases in neither of these clauses, but MySQL
allows referencing SELECT
level aliases in GROUP BY
, ORDER BY
and HAVING
.
And are there any downsides instead of doing "WHERE 1" (writing the whole definition instead of a column name)
If your calculated expression does not contain any aggregates, putting it into the WHERE
clause will most probably be more efficient.
MySql - HAVING vs WHERE
Difference between the having and where clause in sql is that the where clause can not be used with aggregates, but the having clause can. One way to think of it is that the having clause is an additional filter to the where clause.
Which is better : click
What is the difference between HAVING and WHERE in SQL?
HAVING specifies a search condition for a
group or an aggregate function used in SELECT statement.
Source
Where vs Having SQL
For your information, apart from SELECT
queries, you can use WHERE
clause with UPDATE and DELETE clause but HAVING
clause can only be used with SELECT
query. The example:
update CUSTOMER set CUST_NAME="Johnny" WHERE CUST_ID=1; //This line of code worked
update CUSTOMER set CUST_NAME="Johnny" HAVING CUST_ID=1; //Incorrect Syntax
WHERE clause is used for filtering rows and it applies toeach and every row, while HAVING clause is used to filter groups of rows in SQL.
While the WHERE
and HAVING
clause can be used together in a SELECT query with the aggregate function.
SELECT CUST_ID, CUST_NAME, CUST_GENDER
FROM CUSTOMER
WHERE CUST_GENDER='MALE'
GROUP BY CUST_ID
HAVING CUST_ID=8;
In this situation, WHERE
clause will apply first on individual rows and only rows which pass the condition is included for creating groups. Once the group is created, HAVING clause is used to filter groups based upon condition specified.
WHERE vs. HAVING performance with GROUP BY
One of your assumptions is wrong: HAVING is slower than WHERE because it only filters results after accessing and hashing rows.
It's that hashing part that makes HAVING conditions more expensive than WHERE conditions. Hashing requires writing data, which can be more expensive both physically and algorithmically.
Theory
Hashing requires writing as well as reading data. Ideally hashing the data will run in O(n)
time. But in practice there will be hash collisions, which slow things down. And in practice not all the data will fit in memory.
Those two problems can be disastrous. In the worst-case, with limited memory, the hashing requires multiple passes and the complexity approaches O(n^2)
. And writing to disk in the temporary tablespace is orders of magnitude slower than writing to memory.
Those are the kind of performance issues you need to worry about with databases. The constant time to run simple conditions and expressions is usually irrelevant compared to the time to read, write, and join the data.
That might be especially true in your environment. The operation TABLE ACCESS STORAGE FULL
implies you are using Exadata. Depending on the platform you might be taking advantage of SQL in silicon. Those high-level conditions may translate perfectly to low-level instructions executed on storage devices. Which means your estimate of the cost of executing a clause may be several orders of magnitude too high.
Practice
Create a sample table with 100,000 rows:
create table customer(id number, status varchar2(100));
insert into customer
select
level,
case
when level <= 15000 then 'Deceased'
when level between 15001 and 50001 then 'Active'
else 'Dormant'
end
from dual
connect by level <= 100000;
begin
dbms_stats.gather_table_stats(user, 'customer');
end;
/
Running the code in a loop shows that the WHERE
version is about twice as fast as the HAVING
version.
--Run times (in seconds): 0.765, 0.78, 0.765
declare
type string_nt is table of varchar2(100);
type number_nt is table of number;
v_status string_nt;
v_count number_nt;
begin
for i in 1 .. 100 loop
SELECT status, count(status)
bulk collect into v_status, v_count
FROM customer
GROUP BY status
HAVING status != 'Active' AND status != 'Dormant';
end loop;
end;
/
--Run times (in seconds): 0.39, 0.39, 0.39
declare
type string_nt is table of varchar2(100);
type number_nt is table of number;
v_status string_nt;
v_count number_nt;
begin
for i in 1 .. 100 loop
SELECT status, count(status)
bulk collect into v_status, v_count
FROM customer
WHERE status != 'Active' AND status != 'Dormant'
GROUP BY status;
end loop;
end;
/
Difference between HAVING and WHERE Clause
Functionally, the two are equivalent.
The WHERE
clause is saying:
Filter the data and then aggregate the results.
The HAVING
clause is saying:
Aggregate the data and then filter the results.
Both return the same result, because the filtering is on the columns used for aggregation. Usually, HAVING
uses aggregation functions; these are not allowed in the WHERE
.
In general, the WHERE
clause is going to be faster, because less data is being aggregated. You should use WHERE
in this case.
HAVING vs WHERE vs GROUP BY clauses, when to use them and if you use ' '
The answer as per @O. Jones is a nested query:
SELECT post_id
, name
, Email
, CustomerId
, DeliveryDate
, DeliveryTime
, DeliveryType
, Zip
, OrderNote
, PaymentTotal
, OrderStatus
FROM ( SELECT t1.post_id
, t2.name
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as Email
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as CustomerId
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as DeliveryDate
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as DeliveryTime
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as DeliveryType
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as Zip
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as OrderNote
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as PaymentTotal
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as OrderStatus
FROM table_A t1
INNER
JOIN table_B t2
ON FIND_IN_SET(t1.post_id, t2.payment_ids)
GROUP
BY t1.post_id
, t2.name
) AS derived_table
WHERE OrderStatus RLIKE '%trans%|ready'
AND DeliveryDate >= CURRENT_DATE - INTERVAL 7 DAY
AND DeliveryType = 'pickup'
Which SQL statement is faster? (HAVING vs. WHERE...)
The theory (by theory I mean SQL Standard) says that WHERE restricts the result set before returning rows and HAVING restricts the result set after bringing all the rows. So WHERE is faster. On SQL Standard compliant DBMSs in this regard, only use HAVING where you cannot put the condition on a WHERE (like computed columns in some RDBMSs.)
You can just see the execution plan for both and check for yourself, nothing will beat that (measurement for your specific query in your specific environment with your data.)
Related Topics
How to Implement a Many-To-Many Relationship in Postgresql
How to Cast the Datetime to Time
Condition Within Join or Where
Does the Join Order Matter in Sql
How to Get Next/Previous Record in MySQL
Grouping into Interval of 5 Minutes Within a Time Range
Which Is Faster/Best? Select * or Select Column1, Colum2, Column3, etc
How to Declare a Variable in MySQL
Why Does Null = Null Evaluate to False in SQL Server
Tsql Pivot Without Aggregate Function
Error Code: 2013. Lost Connection to MySQL Server During Query
Concatenate Columns in Apache Spark Dataframe
Case Insensitive Searching in Oracle
Error in MySQL When Setting Default Value For Date or Datetime