Where VS Having

SQL - HAVING vs. WHERE

WHERE clause introduces a condition on individual rows; HAVING clause introduces a condition on aggregations, i.e. results of selection where a single result, such as count, average, min, max, or sum, has been produced from multiple rows. Your query calls for a second kind of condition (i.e. a condition on an aggregation) hence HAVING works correctly.

As a rule of thumb, use WHERE before GROUP BY and HAVING after GROUP BY. It is a rather primitive rule, but it is useful in more than 90% of the cases.

While you're at it, you may want to re-write your query using ANSI version of the join:

SELECT  L.LectID, Fname, Lname
FROM Lecturers L
JOIN Lecturers_Specialization S ON L.LectID=S.LectID
GROUP BY L.LectID, Fname, Lname
HAVING COUNT(S.Expertise)>=ALL
(SELECT COUNT(Expertise) FROM Lecturers_Specialization GROUP BY LectID)

This would eliminate WHERE that was used as a theta join condition.

WHERE vs HAVING

Why is it that you need to place columns you create yourself (for example "select 1 as number") after HAVING and not WHERE in MySQL?

WHERE is applied before GROUP BY, HAVING is applied after (and can filter on aggregates).

In general, you can reference aliases in neither of these clauses, but MySQL allows referencing SELECT level aliases in GROUP BY, ORDER BY and HAVING.

And are there any downsides instead of doing "WHERE 1" (writing the whole definition instead of a column name)

If your calculated expression does not contain any aggregates, putting it into the WHERE clause will most probably be more efficient.

MySql - HAVING vs WHERE

Difference between the having and where clause in sql is that the where clause can not be used with aggregates, but the having clause can. One way to think of it is that the having clause is an additional filter to the where clause.

Which is better : click

What is the difference between HAVING and WHERE in SQL?

HAVING specifies a search condition for a
group or an aggregate function used in SELECT statement.

Source

Where vs Having SQL

For your information, apart from SELECT queries, you can use WHERE clause with UPDATE and DELETE clause but HAVING clause can only be used with SELECT query. The example:

update CUSTOMER set CUST_NAME="Johnny" WHERE CUST_ID=1; //This line of code worked
update CUSTOMER set CUST_NAME="Johnny" HAVING CUST_ID=1; //Incorrect Syntax

WHERE clause is used for filtering rows and it applies toeach and every row, while HAVING clause is used to filter groups of rows in SQL.

While the WHERE and HAVING clause can be used together in a SELECT query with the aggregate function.

SELECT CUST_ID, CUST_NAME, CUST_GENDER
FROM CUSTOMER
WHERE CUST_GENDER='MALE'
GROUP BY CUST_ID
HAVING CUST_ID=8;

In this situation, WHERE clause will apply first on individual rows and only rows which pass the condition is included for creating groups. Once the group is created, HAVING clause is used to filter groups based upon condition specified.

WHERE vs. HAVING performance with GROUP BY

One of your assumptions is wrong: HAVING is slower than WHERE because it only filters results after accessing and hashing rows.

It's that hashing part that makes HAVING conditions more expensive than WHERE conditions. Hashing requires writing data, which can be more expensive both physically and algorithmically.

Theory

Hashing requires writing as well as reading data. Ideally hashing the data will run in O(n) time. But in practice there will be hash collisions, which slow things down. And in practice not all the data will fit in memory.

Those two problems can be disastrous. In the worst-case, with limited memory, the hashing requires multiple passes and the complexity approaches O(n^2). And writing to disk in the temporary tablespace is orders of magnitude slower than writing to memory.

Those are the kind of performance issues you need to worry about with databases. The constant time to run simple conditions and expressions is usually irrelevant compared to the time to read, write, and join the data.

That might be especially true in your environment. The operation TABLE ACCESS STORAGE FULL implies you are using Exadata. Depending on the platform you might be taking advantage of SQL in silicon. Those high-level conditions may translate perfectly to low-level instructions executed on storage devices. Which means your estimate of the cost of executing a clause may be several orders of magnitude too high.

Practice

Create a sample table with 100,000 rows:

create table customer(id number, status varchar2(100));

insert into customer
select
level,
case
when level <= 15000 then 'Deceased'
when level between 15001 and 50001 then 'Active'
else 'Dormant'
end
from dual
connect by level <= 100000;

begin
dbms_stats.gather_table_stats(user, 'customer');
end;
/

Running the code in a loop shows that the WHERE version is about twice as fast as the HAVING version.

--Run times (in seconds): 0.765, 0.78, 0.765
declare
type string_nt is table of varchar2(100);
type number_nt is table of number;
v_status string_nt;
v_count number_nt;
begin
for i in 1 .. 100 loop
SELECT status, count(status)
bulk collect into v_status, v_count
FROM customer
GROUP BY status
HAVING status != 'Active' AND status != 'Dormant';
end loop;
end;
/

--Run times (in seconds): 0.39, 0.39, 0.39
declare
type string_nt is table of varchar2(100);
type number_nt is table of number;
v_status string_nt;
v_count number_nt;
begin
for i in 1 .. 100 loop
SELECT status, count(status)
bulk collect into v_status, v_count
FROM customer
WHERE status != 'Active' AND status != 'Dormant'
GROUP BY status;
end loop;
end;
/

Difference between HAVING and WHERE Clause

Functionally, the two are equivalent.

The WHERE clause is saying:

Filter the data and then aggregate the results.

The HAVING clause is saying:

Aggregate the data and then filter the results.

Both return the same result, because the filtering is on the columns used for aggregation. Usually, HAVING uses aggregation functions; these are not allowed in the WHERE.

In general, the WHERE clause is going to be faster, because less data is being aggregated. You should use WHERE in this case.

HAVING vs WHERE vs GROUP BY clauses, when to use them and if you use ' '

The answer as per @O. Jones is a nested query:

SELECT post_id
, name
, Email
, CustomerId
, DeliveryDate
, DeliveryTime
, DeliveryType
, Zip
, OrderNote
, PaymentTotal
, OrderStatus
FROM ( SELECT t1.post_id
, t2.name
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as Email
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as CustomerId
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as DeliveryDate
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as DeliveryTime
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as DeliveryType
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as Zip
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as OrderNote
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as PaymentTotal
, MAX(CASE WHEN meta_key = 'value' THEN meta_value ELSE NULL END) as OrderStatus
FROM table_A t1
INNER
JOIN table_B t2
ON FIND_IN_SET(t1.post_id, t2.payment_ids)
GROUP
BY t1.post_id
, t2.name
) AS derived_table
WHERE OrderStatus RLIKE '%trans%|ready'
AND DeliveryDate >= CURRENT_DATE - INTERVAL 7 DAY
AND DeliveryType = 'pickup'

Which SQL statement is faster? (HAVING vs. WHERE...)

The theory (by theory I mean SQL Standard) says that WHERE restricts the result set before returning rows and HAVING restricts the result set after bringing all the rows. So WHERE is faster. On SQL Standard compliant DBMSs in this regard, only use HAVING where you cannot put the condition on a WHERE (like computed columns in some RDBMSs.)

You can just see the execution plan for both and check for yourself, nothing will beat that (measurement for your specific query in your specific environment with your data.)



Related Topics



Leave a reply



Submit