Why does a query slow down drastically if in the WHERE clause a constant is replaced by a parameter (having the same value)?
As Martin suggested in a comment under the question, the problem is that SQL server does not push down properly the predicate from the WHERE clause - see the link in his comment.
I ended up with creating a user defined table-valued function and use it with the CROSS APPLY operator for creating the view.
Let's see the solution itself.
User Defined Table-valued Function
CREATE FUNCTION [dbo].[TestFunction] (@Id INT)
RETURNS TABLE
AS
RETURN
(
WITH
Hierarchy (Id, ParentId, Data, Depth)
AS(
SELECT Id, ParentId, NULL AS Data, 0 AS Depth FROM Test Where Id = @Id
UNION ALL
SELECT h.Id, t.ParentId, COALESCE(h.Data, t.Data), Depth + 1 AS Depth
FROM Hierarchy h
INNER JOIN Test t ON t.Id = h.ParentId
)
SELECT * FROM Hierarchy
)
View
CREATE VIEW [dbo].[TestView]
AS
SELECT t.Id, t.ParentId, f.Data, f.Depth
FROM
Test AS t
CROSS APPLY TestFunction(Id) as f
Query with constant
SELECT * FROM TestView WHERE Id = 69
Query with parameter
DECLARE @Id INT
SELECT @Id = 69
SELECT * FROM TestView WHERE Id = @Id
The query with the parmater executes basically as fast as the query with the constant.
Thank You Martin and for the others as well!
Slow query caused by parameter variables, but why?
A summary of what lead to a solution:
Add a covering index on supporterid, callEnd
.
The assumption here is that the optimizer can use this index (in contrast with callEnd, supporterid) to
- first join
tblSupporterMainDetails
andtblCallLogs
- further use it in the
where
clause for selectingcallEnd
Add the option OPTION(RECOMPILE)
all cudo's to TiborK and Hunchback for explaining the difference to the optimizer of using hard coded constants or variables.
Performance Impact - Constant value -vs- Variable
When you use the constant, the value is known to the optimizer so it
can determine selectivity (and possible index usage) based on that.
When you use a variable, the value is unknown to the optimizer (so it
have to go by some hardwired value or possibly density info). So,
technically, this isn't parameter sniffing, but whatever article you
find on that subject should also explain the difference between a
constant and a variable. Using OPTION(RECOMPILE) will actually turn
the variabe to a parameter sniffing situation.In essence, there is a big difference between a constant, a variable
and a paramater (whcih can be sniffed).
WHERE clause is slower with value from CTE than with constant?
The explanation behind the difference you observed is this:
Postgres has column statistics and can adapt the query plan depending on the value of a provided constant for datetime_threshold
. With favorable filter values, this can lead to a much more efficient query plan.
In the other case, when datetime_threshold
has to be computed in another SELECT
first, Postgres has to default to a generic plan. datetime_threshold
could be anything.
The difference will become obvious in EXPLAIN
output.
To make sure Postgres optimizes the second part for the actual datetime_threshold
value, you can either run two separate queries (feed the result of query 1 as constant to query 2), or use dynamic SQL to force re-planning of query 2 every time in a PL/pgSQL function.
For example
CREATE OR REPLACE FUNCTION foo(_user_id int, _distance int = 70)
RETURNS SETOF locations
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE
'SELECT *
FROM locations
WHERE user_id = $1
AND datetime > $2'
USING _user_id
, (SELECT max(datetime)
FROM locations
WHERE distance > _distance
AND user_id = _user_id);
END
$func$;
Call:
SELECT * FROM foo(9087);
Related:
- Dynamic ORDER BY and ASC / DESC in a plpgsql function
- Optional argument in PL/pgSQL function
In extreme cases, you might even use another dynamic query to calculate datetime_threshold
. But I don't expect that's necessary.
As for "something useful in the docs":
[...] The important difference is that
EXECUTE
will re-plan the
command on each execution, generating a plan that is specific to the
current parameter values; whereas PL/pgSQL may otherwise create a
generic plan and cache it for re-use. In situations where the best
plan depends strongly on the parameter values, it can be helpful to
useEXECUTE
to positively ensure that a generic plan is not selected.
Bold emphasis mine.
Indexes
Perfect indexes would be:
CREATE INDEX ON locations (user_id, distance DESC NULL LAST, date_time DESC NULLS LAST); -- for query 1
CREATE INDEX ON locations (user_id, date_time); -- for query 2
Fine tuning depends on undisclosed details. Partial index might be an option.
There may be any number of additional reasons why your query is slow. Not enough details.
Why does this query become drastically slower when wrapped in a TVF?
I isolated the problem to one line in the query. Keeping in mind that the query is 160 lines long, and I'm including the relevant tables either way, if I disable this line from the SELECT clause:
COALESCE(V.Visits, 0) * COALESCE(ACS.AvgClickCost, GAAC.AvgAdCost, 0.00)
...the run time drops from 63 minutes to five seconds (inlining a CTE has made it slightly faster than the original seven-second query). Including either ACS.AvgClickCost
or GAAC.AvgAdCost
causes the run time to explode. What makes it especially odd is that these fields come from two subqueries which have, respectively, ten rows and three! They each run in zero seconds when run independently, and with the row counts being so short I would expect the join time to be trivial even using nested loops.
Any guesses as to why this seemingly-harmless calculation would throw off a TVF completely, while it runs very quickly as a stand-alone query?
Can some one tell me Why This is Query are very slow?
The query are fast it take to 200ms to exectue, but the time for processing the query and retrieving the data are the long. I think there's no way to reduce this time.
Does having too many subqueries in the from clause slow down the query
[TL;DR] It depends on the queries you are using in the sub-queries.
In your case:
select id,
name,
annual_income * 0.10 AS tax
from (
select id,
name,
annual_income
from (
select id,
first_name || ' ' || last_name AS name
income * 12 AS annual_income
from table_name
)
);
Will get rewritten by the SQL engine to:
select id,
first_name || ' ' || last_name AS name
income * 1.2 AS tax
from table_name;
There will be no difference in performance between the two queries and if it is easier for you to understand and/or maintain the query in its expanded form then you should use that format and not worry about the nested sub-queries.
However, there are some cases when sub-queries can affect performance. For example, this question was a particularly complex issue where the sub-query factoring clause was being materialized by the inclusion of the ROWNUM
pseudo-column and that forced the SQL engine to execute in a particular order and prevented if from rewriting the query into a more optimal form and prevented it from using an index which made the query very slow.
Why does the SELECT command run faster when WHERE is included?
This depends on a lot of factors. As mentioned by Bobek, the issue with elapsed time could simply be the time to return all the records. Let's assume that you are looking just at processing time.
The first question is: did you run these results multiple times, taking into account caching effects? If you run the first query, the table gets loaded into memory and will stay there. Subsequent queries on the table, including the first, will be much, much faster the second time. When doing timings, you have to be quite careful.
Another possibility is the existence of indexes, although I doubt there would be an index on FirstName
. An index greatly reduces the time for fetching the records. It simply goes to the index to find the right records, looks them up, and returns the result. In the end, your query has to fetch the data on the page, because of the select *
.
As for checks taking longer, that is really a non-issue. The amount of time for processing a page is typically going to be much, much larger than a boolean operation on a record. Many other factors have a larger impact on performance.
My guess, in your case, is that you ran the queries as described in the question, and the performance difference is due to caching effects.
Variables make query performance worse
SELECT * FROM [DIME_WH].[dbo].[FactOrderLines2] FL (nolock)
WHERE DD_OrderDate >= '2018-08-01'
AND DD_OrderDate <= '2018-08-17'
When constant is used in parameter, then Optimiser
create special plan
for this query.so if same query is executed with same value then plan is reuse, if value is change then another plan is created.
So Parameter with constant value is fast.
SELECT *
FROM [DIME_WH].[dbo].[FactOrderLines2] FL (nolock)
WHERE DD_OrderDate >= @StartDate
AND DD_OrderDate <= @EndDate
When variable is use in parameter.Then Optimizer create Execution plan for the First parameter value that was passed .
For Example @StartDate='2018-08-01'
and @EndDate='2018-08-07'
value were pass for first time.
Then optimal execution plan is created by optimiser. This plan is good enough for this value.
Next Time @StartDate='2018-08-01'
and @EndDate='2018-08-31'
value is pass then same previous plan is use which may not be optimal for this parameter.
In another word same plan which was Optimal for first value is Sub optimal for another value.
so query may perform poor and slow.This is known as Parameter sniffing
.
There are several ways to overcome this problem.
Parameter Sniffing
Note : In this thread we are only focussing on why variable performance is slow while other factor remaining constant.
Related Topics
Association Between Two Entries in SQL Table
Replacing Certain Character in Email Addresses with '*' in an SQL Query
I Am Trying to Copy a File, But Getting Error Message
Grouping by Date, Return Row Even If No Records Found
Sql Design Approach for Searching a Table with an Unlimited Number of Bit Fields
How to Handle 'Optional' Where Clause Filters in Sql
Sql Dynamic Order by Using Alias
Sql Select Multiple Rows in One Column
Use Soundex() Word by Word on SQL Server
Left Join with Dynamic Table Name Derived from Column
Retrieve The Most Recent Record for Each Customer
Haversine Formula Using SQL Server to Find Closest Venue - VB.NET
Sql Server Management Studio 2008 Scheduled Export to Ms Access
Generate Create Scripts for a List of Indexes
How to Find Tables Which Reference a Particular Row via a Foreign Key