Oracle Performance in or Or

IN vs OR of Oracle, which faster?

IN is preferable to OR -- OR is a notoriously bad performer, and can cause other issues that would require using parenthesis in complex queries.

Better option than either IN or OR, is to join to a table containing the values you want (or don't want). This table for comparison can be derived, temporary, or already existing in your schema.

Oracle OR conditions makes query very slow

In my experience, OR tends to introduce a negative impact on queries (like ignoring indices and triggering full table scans). Sometimes this isn't so bad, but I have had queries that went from lightening fast to taking minutes because of it.

One possible solution is to change the OR into a UNION or even a UNION ALL. I have had success with this in the past to improve the performance of queries, but you will have to compare them to one another to see whether this will work for you.

You can try out the three options below and see if any one of them offers a significant improvement over the others.

Original query (edited to return rows since you mentioned returning data instead of doing a count):

select * from table where column_a in (list_a) or column_b in (list_b)

Query that avoids the OR:

select * from table where column_a in (list_a)
UNION
select * from table where column_b in (list_b)

And since a UNION triggers a DISTINCT, this might be worth trying out as well:

select * from table where column_a in (list_a) and not column_b in (list_b)
UNION ALL
select * from table where column_b in (list_b) and not column_a in (list_a)
UNION ALL
select * from table where column_a in (list_a) and column_b in (list_b)

What are the performance implications of Oracle IN Clause with no joins?

If the statistics on your table are accurate, it should be very unlikely that the optimizer would choose to do a table scan rather than using the primary key index when you only have 1000 hard-coded elements in the WHERE clause. The best approach would be to gather (or set) accurate statistics on your objects since that should cause good things to happen automatically rather than trying to do a lot of gymnastics in order to work around incorrect statistics.

If we assume that the statistics are inaccurate to the degree that the optimizer would be lead to believe that a table scan would be more efficient than using the primary key index, you could potentially add in a DYNAMIC_SAMPLING hint that would force the optimizer to gather more accurate statistics before optimizing the statement or a CARDINALITY hint to override the optimizer's default cardinality estimate. Neither of those would require knowing anything about the available indexes, it would just require knowing the table alias (or name if there is no alias). DYNAMIC_SAMPLING would be the safer, more robust approach but it would add time to the parsing step.

If you are building up a SQL statement with a variable number of hard-coded parameters in an IN clause, you're likely going to be creating performance problems for yourself by flooding your shared pool with non-sharable SQL and forcing the database to spend a lot of time hard parsing each variant separately. It would be much more efficient if you created a single sharable SQL statement that could be parsed once. Depending on where your IN clause values are coming from, that might look something like

SELECT *
FROM table_name
WHERE primary_key IN (SELECT primary_key
FROM global_temporary_table);

or

SELECT *
FROM table_name
WHERE primary_key IN (SELECT primary_key
FROM TABLE( nested_table ));

or

SELECT *
FROM table_name
WHERE primary_key IN (SELECT primary_key
FROM some_other_source);

If you got yourself down to a single sharable SQL statement, then in addition to avoiding the cost of constantly re-parsing the statement, you'd have a number of options for forcing a particular plan that don't involve modifying the SQL statement. Different versions of Oracle have different options for plan stability-- there are stored outlines, SQL plan management, and SQL profiles among other technologies depending on your release. You can use these to force particular plans for particular SQL statements. If you keep generating new SQL statements that have to be re-parsed, however, it becomes very difficult to use these technologies.

Oracle performance and join with OR

scaisEdge's answer provides a specific solution but it's also worth learning the generic reason why OR conditions in joins can cause performance issues - OR conditions prevent the optimizer from using hash joins.

Hash joins are usually the fastest way to join a large percentage of data. (Your tables may be small but since there are no filters they are processing a large percentage of data.) But hash joins can only work with equality conditions. To process an OR condition, Oracle will have to use a slower join method, like a sort merge or a nested loop.

If you think of the number of rows from the two tables as M and N, a hash join can (theoretically) be done in M+N operations, whereas a sort merge join can be done in M*LOG(N). If you're curious, this sample chapter contains more details about Oracle's different join types and how their performance compares.

A UNION version has to join the tables twice, but two fast joins can be better than one slow join. Oracle can't always convert an OR to UNION or UNION ALL because those two versions may not necessarily return the same rows as your other queries. For example, the UNION alternatives would treat duplicate rows differently. (But maybe you know something about your data that Oracle doesn't, so the queries may still work fine for you.)

This knowledge is useful because OR is not inherently slow and we don't always want to avoid it. If the OR is just part of an index access, or part of a join that won't use a hash join anyway, then there's nothing wrong with it.

Oracle Performance issues on using subquery in an In orperator

As mentioned from my comment 2 days ago. Someone have actually posted the solution and then have it removed while the answer actually work. After waiting for 2 days the So I designed to post that solution.

That solution suggested that the performance was slow down by the "in" operator. and suggested me to replace it with an inner join

Create Table T1 as 
Select
FV.*
from
FinalView1 FV
inner join (
select distinct
CustomerID
from
CriteriaTable
) CT on CT.customerid = FV.customerID;

Result from explain plan was worse then before:
Cardinality:28364465 (from 27921981)
Cost: 15060 (from 14846)

However, it only takes 17 secs. Which is very good!

Performance drop returning cursor with union all

Actual solution

Managed to get procedure execution plan from DBA. The problem was that optimizer chose another index for joining scheme.sales_details table when executing query inside the procedure. Added INDEX HINT with the same index which was used in regular query and everything works just fine.

Deprecated ideas down below

As far as I understood the problem is in Oracle optimizer which "thought" that doing UNION ALL first is better than pushing predicate into the sub-query. Separating this union into two single queries make him push pred without any hesitations.

Probably this can be fixed by playing with hints, that's wip for now.

Temporary workaround is to regroup the query, going from this structure

    select *
from (select row_number() rn
, u.*
from (select *
from first_query
union all
select *
from second_query) u
-- some joins
join first_table ft
join second_table st
-- predicate block
where 1=1
and a = b
)
where rn between c and d;

to this

select *
from (select row_number() rn
, u.*
from (select *
from first_query) u
-- some joins
join first_table ft
join second_table st
-- predicate block
where 1=1
and a = b
union all
select row_number() rn
, u.*
from (select *
from second_query) u
-- some joins
join first_table ft
join second_table st
-- predicate block
where 1=1
and a = b
)
where rn between c and d;

That's not the perfect solution cause it doubles the JOIN section but at least it works.

Which metrics to compare when evaluating SQL query performance?

In general, I would be very wary about comparing the cost between two queries unless you have a very specific reason to believe that makes sense.

In general, people don't look at the 99.9% of queries that the optimizer produces a (nearly) optimal plan for. People look at queries where the optimizer has produced a decidedly sub-optimal plan. The optimizer will produce a sub-optimal plan for one of two basic reasons-- either it can't transform a query into a form it can optimize (in which case a human likely needs to rewrite the query) or the statistics it is using to make its estimates are incorrect so what it thinks is an optimal plan is not. (Of course, there are other reasons queries might be slow-- perhaps the optimizer produced an optimal plan but the optimal plan is doing a table scan because an index is missing for example.)

If I'm looking at a query that is slow and the query seems to be reasonably well-written and a reasonable set of indexes are available, statistics are the most likely source of problems. Since cost is based entirely on statistics, however, that means that the optimizer's cost estimates are incorrect. If they are incorrect, the cost is roughly equally likely to be incorrectly high or incorrectly low. If I look at the query plan for a query that I know needs to aggregate hundreds of thousands of rows to produce a report and I see that the optimizer has assigned it a single-digit cost, I know that somewhere along the line it is estimating that a step will return far too few rows. In order to tune that query, I'm going to need the cost to go up so that the optimizer's estimates accurately reflect reality. If I look at the query plan for a query I know should only need to scan a handful of rows and I see a cost in the tens of thousands, I know that the optimizer is estimating that some step will return far too many rows. In order to tune that query, I'm going to need the cost to go down so that the optimizer's estimates reflect reality.

If you use the gather_plan_statistics hint, you'll see the estimated and actual row counts in your query plan. If the optimizer's estimates are close to reality, the plan is likely to be pretty good and cost is likely to be reasonably accurate. If the optimizer's estimates are off, the plan is likely to be poor and the cost is likely to be wrong. Trying to use a cost metric to tune a query without first confirming that the cost is reasonably close to reality is seldom very productive.

Personally, I would ignore cost and focus on metrics that are likely to be stable over time and that are actually correlated with performance. My bias would be to focus on logical reads since most systems are I/O bound but you could use CPU time or elapsed time as well (elapsed time, though, tends not to be particularly stable because it depends on what happens to be in cache at the time the query is run). If you're looking at a plan, focus on the estimated vs. actual row counts not on the cost.



Related Topics



Leave a reply



Submit