Which SQL Statement Is Faster? (Having VS. Where...)

Which SQL statement is faster? (HAVING vs. WHERE...)

The theory (by theory I mean SQL Standard) says that WHERE restricts the result set before returning rows and HAVING restricts the result set after bringing all the rows. So WHERE is faster. On SQL Standard compliant DBMSs in this regard, only use HAVING where you cannot put the condition on a WHERE (like computed columns in some RDBMSs.)

You can just see the execution plan for both and check for yourself, nothing will beat that (measurement for your specific query in your specific environment with your data.)

Having OR Where, Which is Faster in performance?

If a condition refers to an aggregate function, put that condition in the HAVING clause. Otherwise, use the WHERE clause.

You can use HAVING but recommended you should use with GROUP BY.

SQL Standard says that WHERE restricts the result set before returning rows and HAVING restricts the result set after bringing all the rows. So WHERE is faster.

WHERE vs. HAVING performance with GROUP BY

One of your assumptions is wrong: HAVING is slower than WHERE because it only filters results after accessing and hashing rows.

It's that hashing part that makes HAVING conditions more expensive than WHERE conditions. Hashing requires writing data, which can be more expensive both physically and algorithmically.

Theory

Hashing requires writing as well as reading data. Ideally hashing the data will run in O(n) time. But in practice there will be hash collisions, which slow things down. And in practice not all the data will fit in memory.

Those two problems can be disastrous. In the worst-case, with limited memory, the hashing requires multiple passes and the complexity approaches O(n^2). And writing to disk in the temporary tablespace is orders of magnitude slower than writing to memory.

Those are the kind of performance issues you need to worry about with databases. The constant time to run simple conditions and expressions is usually irrelevant compared to the time to read, write, and join the data.

That might be especially true in your environment. The operation TABLE ACCESS STORAGE FULL implies you are using Exadata. Depending on the platform you might be taking advantage of SQL in silicon. Those high-level conditions may translate perfectly to low-level instructions executed on storage devices. Which means your estimate of the cost of executing a clause may be several orders of magnitude too high.

Practice

Create a sample table with 100,000 rows:

create table customer(id number, status varchar2(100));

insert into customer
select
level,
case
when level <= 15000 then 'Deceased'
when level between 15001 and 50001 then 'Active'
else 'Dormant'
end
from dual
connect by level <= 100000;

begin
dbms_stats.gather_table_stats(user, 'customer');
end;
/

Running the code in a loop shows that the WHERE version is about twice as fast as the HAVING version.

--Run times (in seconds): 0.765, 0.78, 0.765
declare
type string_nt is table of varchar2(100);
type number_nt is table of number;
v_status string_nt;
v_count number_nt;
begin
for i in 1 .. 100 loop
SELECT status, count(status)
bulk collect into v_status, v_count
FROM customer
GROUP BY status
HAVING status != 'Active' AND status != 'Dormant';
end loop;
end;
/

--Run times (in seconds): 0.39, 0.39, 0.39
declare
type string_nt is table of varchar2(100);
type number_nt is table of number;
v_status string_nt;
v_count number_nt;
begin
for i in 1 .. 100 loop
SELECT status, count(status)
bulk collect into v_status, v_count
FROM customer
WHERE status != 'Active' AND status != 'Dormant'
GROUP BY status;
end loop;
end;
/

What is faster, JOIN AND or WHERE?

Since the join is inner by default, there is no logical difference between these queries.

Any query optimizer worth its salt would produce identical execution plans for these two queries. Here is the execution plan that I see for both cases:

Execution Plan

(the plan that you see will be different, because the sample tables I created are not indexed, while your tables would very likely have suitable indexes).

Had the join been outer, there would be a difference between the two queries: the second one would delete all rows of PersonInfo that have no corresponding rows in the Person table, in addition to deleting the info for all 'Smith's.

SQL 'like' vs '=' performance

See https://web.archive.org/web/20150209022016/http://myitforum.com/cs2/blogs/jnelson/archive/2007/11/16/108354.aspx

Quote from there:

the rules for index usage with LIKE
are loosely like this:

  • If your filter criteria uses equals =
    and the field is indexed, then most
    likely it will use an INDEX/CLUSTERED
    INDEX SEEK

  • If your filter criteria uses LIKE,
    with no wildcards (like if you had a
    parameter in a web report that COULD
    have a % but you instead use the full
    string), it is about as likely as #1
    to use the index. The increased cost
    is almost nothing.

  • If your filter criteria uses LIKE, but
    with a wildcard at the beginning (as
    in Name0 LIKE '%UTER') it's much less
    likely to use the index, but it still
    may at least perform an INDEX SCAN on
    a full or partial range of the index.

  • HOWEVER, if your filter criteria uses
    LIKE, but starts with a STRING FIRST
    and has wildcards somewhere AFTER that
    (as in Name0 LIKE 'COMP%ER'), then SQL
    may just use an INDEX SEEK to quickly
    find rows that have the same first
    starting characters, and then look
    through those rows for an exact match.


(Also keep in mind, the SQL engine
still might not use an index the way
you're expecting, depending on what
else is going on in your query and
what tables you're joining to. The
SQL engine reserves the right to
rewrite your query a little to get the
data in a way that it thinks is most
efficient and that may include an
INDEX SCAN instead of an INDEX SEEK)

Which SQL query is faster? Filter on Join criteria or Where clause?

Performance-wise, they are the same (and produce the same plans)

Logically, you should make the operation that still has sense if you replace INNER JOIN with a LEFT JOIN.

In your very case this will look like this:

SELECT  *
FROM TableA a
LEFT JOIN
TableXRef x
ON x.TableAID = a.ID
AND a.ID = 1
LEFT JOIN
TableB b
ON x.TableBID = b.ID

or this:

SELECT  *
FROM TableA a
LEFT JOIN
TableXRef x
ON x.TableAID = a.ID
LEFT JOIN
TableB b
ON b.id = x.TableBID
WHERE a.id = 1

The former query will not return any actual matches for a.id other than 1, so the latter syntax (with WHERE) is logically more consistent.

Why is UNION faster than an OR statement

The reason is that using OR in a query will often cause the Query Optimizer to abandon use of index seeks and revert to scans. If you look at the execution plans for your two queries, you'll most likely see scans where you are using the OR and seeks where you are using the UNION. Without seeing your query it's not really possible to give you any ideas on how you might be able to restructure the OR condition. But you may find that inserting the rows into a temporary table and joining on to it may yield a positive result.

Also, it is generally best to use UNION ALL rather than UNION if you want all results, as you remove the cost of row-matching.

Are CASE statements or OR statements faster in a WHERE clause? (SQL/BigQuery)

From a code craft viewpoint alone, I would probably always write your CASE expression as this:

WHERE h.isEntrance = TRUE OR h.hitNumber = 1

As to whether which version would be more optimal, I might expect the above, as the CASE expression might force BigQuery to materialize an intermediate value, and then do a comparison on it.

In any case (no pun intended), this point is most likely not the biggest performance bottleneck in your query.

Is a JOIN faster than a WHERE?

Theoretically, no, it shouldn't be any faster. The query optimizer should be able to generate an identical execution plan. However, some database engines can produce better execution plans for one of them (not likely to happen for such a simple query but for complex enough ones). You should test both and see (on your database engine).

IN vs OR in the SQL WHERE clause

I assume you want to know the performance difference between the following:

WHERE foo IN ('a', 'b', 'c')
WHERE foo = 'a' OR foo = 'b' OR foo = 'c'

According to the manual for MySQL if the values are constant IN sorts the list and then uses a binary search. I would imagine that OR evaluates them one by one in no particular order. So IN is faster in some circumstances.

The best way to know is to profile both on your database with your specific data to see which is faster.

I tried both on a MySQL with 1000000 rows. When the column is indexed there is no discernable difference in performance - both are nearly instant. When the column is not indexed I got these results:

SELECT COUNT(*) FROM t_inner WHERE val IN (1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000);
1 row fetched in 0.0032 (1.2679 seconds)

SELECT COUNT(*) FROM t_inner WHERE val = 1000 OR val = 2000 OR val = 3000 OR val = 4000 OR val = 5000 OR val = 6000 OR val = 7000 OR val = 8000 OR val = 9000;
1 row fetched in 0.0026 (1.7385 seconds)

So in this case the method using OR is about 30% slower. Adding more terms makes the difference larger. Results may vary on other databases and on other data.



Related Topics



Leave a reply



Submit