Why Does a Query Slow Down Drastically If in The Where Clause a Constant Is Replaced by a Parameter (Having The Same Value)

Why does a query slow down drastically if in the WHERE clause a constant is replaced by a parameter (having the same value)?

As Martin suggested in a comment under the question, the problem is that SQL server does not push down properly the predicate from the WHERE clause - see the link in his comment.

I ended up with creating a user defined table-valued function and use it with the CROSS APPLY operator for creating the view.

Let's see the solution itself.

User Defined Table-valued Function

CREATE FUNCTION [dbo].[TestFunction] (@Id INT)
RETURNS TABLE
AS
RETURN
(
WITH
Hierarchy (Id, ParentId, Data, Depth)
AS(
SELECT Id, ParentId, NULL AS Data, 0 AS Depth FROM Test Where Id = @Id
UNION ALL
SELECT h.Id, t.ParentId, COALESCE(h.Data, t.Data), Depth + 1 AS Depth
FROM Hierarchy h
INNER JOIN Test t ON t.Id = h.ParentId
)
SELECT * FROM Hierarchy
)

View

CREATE VIEW [dbo].[TestView]
AS
SELECT t.Id, t.ParentId, f.Data, f.Depth
FROM
Test AS t
CROSS APPLY TestFunction(Id) as f

Query with constant

SELECT * FROM TestView WHERE Id = 69

Query with parameter

DECLARE @Id INT
SELECT @Id = 69
SELECT * FROM TestView WHERE Id = @Id

The query with the parmater executes basically as fast as the query with the constant.

Thank You Martin and for the others as well!

Slow query caused by parameter variables, but why?

A summary of what lead to a solution:

Add a covering index on supporterid, callEnd.

The assumption here is that the optimizer can use this index (in contrast with callEnd, supporterid) to

  • first join tblSupporterMainDetailsand tblCallLogs
  • further use it in the where clause for selecting callEnd

Add the option OPTION(RECOMPILE)

all cudo's to TiborK and Hunchback for explaining the difference to the optimizer of using hard coded constants or variables.

Performance Impact - Constant value -vs- Variable

When you use the constant, the value is known to the optimizer so it
can determine selectivity (and possible index usage) based on that.
When you use a variable, the value is unknown to the optimizer (so it
have to go by some hardwired value or possibly density info). So,
technically, this isn't parameter sniffing, but whatever article you
find on that subject should also explain the difference between a
constant and a variable. Using OPTION(RECOMPILE) will actually turn
the variabe to a parameter sniffing situation.

In essence, there is a big difference between a constant, a variable
and a paramater (whcih can be sniffed).

WHERE clause is slower with value from CTE than with constant?

The explanation behind the difference you observed is this:

Postgres has column statistics and can adapt the query plan depending on the value of a provided constant for datetime_threshold. With favorable filter values, this can lead to a much more efficient query plan.

In the other case, when datetime_threshold has to be computed in another SELECT first, Postgres has to default to a generic plan. datetime_threshold could be anything.

The difference will become obvious in EXPLAIN output.

To make sure Postgres optimizes the second part for the actual datetime_threshold value, you can either run two separate queries (feed the result of query 1 as constant to query 2), or use dynamic SQL to force re-planning of query 2 every time in a PL/pgSQL function.

For example

CREATE OR REPLACE FUNCTION foo(_user_id int, _distance int = 70)
RETURNS SETOF locations
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE
'SELECT *
FROM locations
WHERE user_id = $1
AND datetime > $2'
USING _user_id
, (SELECT max(datetime)
FROM locations
WHERE distance > _distance
AND user_id = _user_id);
END
$func$;

Call:

SELECT * FROM foo(9087);

Related:

  • Dynamic ORDER BY and ASC / DESC in a plpgsql function
  • Optional argument in PL/pgSQL function

In extreme cases, you might even use another dynamic query to calculate datetime_threshold. But I don't expect that's necessary.

As for "something useful in the docs":

[...] The important difference is that EXECUTE will re-plan the
command on each execution, generating a plan that is specific to the
current parameter values; whereas PL/pgSQL may otherwise create a
generic plan and cache it for re-use. In situations where the best
plan depends strongly on the parameter values, it can be helpful to
use EXECUTE to positively ensure that a generic plan is not selected.

Bold emphasis mine.

Indexes

Perfect indexes would be:

CREATE INDEX ON locations (user_id, distance DESC NULL LAST, date_time DESC NULLS LAST); -- for query 1
CREATE INDEX ON locations (user_id, date_time); -- for query 2

Fine tuning depends on undisclosed details. Partial index might be an option.

There may be any number of additional reasons why your query is slow. Not enough details.

Why does this query become drastically slower when wrapped in a TVF?

I isolated the problem to one line in the query. Keeping in mind that the query is 160 lines long, and I'm including the relevant tables either way, if I disable this line from the SELECT clause:

COALESCE(V.Visits, 0) * COALESCE(ACS.AvgClickCost, GAAC.AvgAdCost, 0.00)

...the run time drops from 63 minutes to five seconds (inlining a CTE has made it slightly faster than the original seven-second query). Including either ACS.AvgClickCost or GAAC.AvgAdCost causes the run time to explode. What makes it especially odd is that these fields come from two subqueries which have, respectively, ten rows and three! They each run in zero seconds when run independently, and with the row counts being so short I would expect the join time to be trivial even using nested loops.

Any guesses as to why this seemingly-harmless calculation would throw off a TVF completely, while it runs very quickly as a stand-alone query?

Can some one tell me Why This is Query are very slow?

The query are fast it take to 200ms to exectue, but the time for processing the query and retrieving the data are the long. I think there's no way to reduce this time.

Does having too many subqueries in the from clause slow down the query

[TL;DR] It depends on the queries you are using in the sub-queries.


In your case:

select id,
name,
annual_income * 0.10 AS tax
from (
select id,
name,
annual_income
from (
select id,
first_name || ' ' || last_name AS name
income * 12 AS annual_income
from table_name
)
);

Will get rewritten by the SQL engine to:

select id,
first_name || ' ' || last_name AS name
income * 1.2 AS tax
from table_name;

There will be no difference in performance between the two queries and if it is easier for you to understand and/or maintain the query in its expanded form then you should use that format and not worry about the nested sub-queries.


However, there are some cases when sub-queries can affect performance. For example, this question was a particularly complex issue where the sub-query factoring clause was being materialized by the inclusion of the ROWNUM pseudo-column and that forced the SQL engine to execute in a particular order and prevented if from rewriting the query into a more optimal form and prevented it from using an index which made the query very slow.

Why does the SELECT command run faster when WHERE is included?

This depends on a lot of factors. As mentioned by Bobek, the issue with elapsed time could simply be the time to return all the records. Let's assume that you are looking just at processing time.

The first question is: did you run these results multiple times, taking into account caching effects? If you run the first query, the table gets loaded into memory and will stay there. Subsequent queries on the table, including the first, will be much, much faster the second time. When doing timings, you have to be quite careful.

Another possibility is the existence of indexes, although I doubt there would be an index on FirstName. An index greatly reduces the time for fetching the records. It simply goes to the index to find the right records, looks them up, and returns the result. In the end, your query has to fetch the data on the page, because of the select *.

As for checks taking longer, that is really a non-issue. The amount of time for processing a page is typically going to be much, much larger than a boolean operation on a record. Many other factors have a larger impact on performance.

My guess, in your case, is that you ran the queries as described in the question, and the performance difference is due to caching effects.

Variables make query performance worse

SELECT * FROM [DIME_WH].[dbo].[FactOrderLines2] FL (nolock)
WHERE DD_OrderDate >= '2018-08-01'
AND DD_OrderDate <= '2018-08-17'

When constant is used in parameter, then Optimiser create special plan for this query.so if same query is executed with same value then plan is reuse, if value is change then another plan is created.

So Parameter with constant value is fast.

SELECT *
FROM [DIME_WH].[dbo].[FactOrderLines2] FL (nolock)
WHERE DD_OrderDate >= @StartDate
AND DD_OrderDate <= @EndDate

When variable is use in parameter.Then Optimizer create Execution plan for the First parameter value that was passed .

For Example @StartDate='2018-08-01' and @EndDate='2018-08-07' value were pass for first time.
Then optimal execution plan is created by optimiser. This plan is good enough for this value.
Next Time @StartDate='2018-08-01' and @EndDate='2018-08-31' value is pass then same previous plan is use which may not be optimal for this parameter.

In another word same plan which was Optimal for first value is Sub optimal for another value.

so query may perform poor and slow.This is known as Parameter sniffing.

There are several ways to overcome this problem.

Parameter Sniffing

Note : In this thread we are only focussing on why variable performance is slow while other factor remaining constant.



Related Topics



Leave a reply



Submit