Why Does the SQLserver Optimizer Get So Confused with Parameters

SQL Server 2008 Stored proc - Optimizer thinks my parameter is nullable

Re: Nullable variables

There is no concept of nullable for variables in T-SQL, the way that you can define a variable as nullable in c# using the ?.
If you have a parameter in a stored procedure, the end user can pass whatever he or she wants into the stored procedure, be it a real value or a null.

Re: the query plan

The query plan that will get cached is the query plan that gets generated upon the first time you call this stored procedure.. so if you passed in a null for @FooValue the very first time you ran it, then it will be optimized for @FooValue = null.

There is an OPTIMIZE FOR hint that you can use to optimize the query for some other value:

Or you can use WITH RECOMPILE, which will force the query plan to get regenerated on every run of the stored procedure.

Obviously there are trade-offs when using these types of hints, so make sure you understand them before using them.

Why does SQL Server query optimizer sometimes overlook obvious clustered primary key?

1) In my opinion, the key point here is that for clustered tables (tables which have a clustered index = the main data structure = is that data structure that store table data = clustered index is the table itself) the every non-clustered index include also the the key of clustered index. This means that

CREATE [UNIQUE] NONCLUSTERED INDEX bla 
ON [dbo].[msgr] (uid)

is basically the same thing as

CREATE [UNIQUE] NONCLUSTERED INDEX bla 
ON [dbo].[msgr] (uid)
INCLUDE (id) -- id = key of clustered index

So, for such tables, every record from non-clustered indexes on the leaf pages includes also the key of clustered index. This way, within every non-clustered index and for every leaf record SQL Server store also some kind of pointer to the main data structure.

2) This means that SELECT COUNT(id) FROM dbo.msgr can be executed using CI but also using NCI because both indexes include the id (key of clustered index) column.

As a secondary note within this topic, because IDENTITY property (for id column) means a mandatory column (NOT NULL), COUNT(id) is the same thing as COUNT(*). Also, this means that COUNT(msg_id) (also a mandatory / NOT NULL) column is the same thing as COUNT(*). So, it's very likely that execution plan for SELECT COUNT(msg_id) FROM dbo.msgr will use the same NCI (for example bla).

3) Non-clustered indexes have smaller size than clustered index. This means also less IO => It's better from performance point of view to use the NCI than CI.

I would do following simple test:

SET STATISTICS IO ON;
GO

SELECT COUNT(id)
FROM dbo.[dbo].[msgr] WITH(INDEX=[bla]) -- It forces usage of NCI
GO

SELECT COUNT(id)
FROM dbo.[dbo].[msgr] WITH(INDEX=[PK_msgr]) -- It forces usage of CI
GO

SET STATISTICS IO OFF;
GO

If there is a lot of data within msgr table then STATISTICS IO will show different LIO (logical IO), with less LIO for NCI query.

Different execution plan when executing statement directly and from stored procedure

This generally has something to do with parameter sniffing. It can be very frustrating to deal with. Sometimes it can be solved by recompiling the stored procedure, and sometimes you can even use a duplicate variable inside the stored procedure like this:

alter procedure p_myproc (@p1 int) as
declare @p1_copy int;
set @p1_copy = @p1;

And then use @p1_copy in the query. Seems ridiculous but it works.

Check my recent question on the same topic:

Why does the SqlServer optimizer get so confused with parameters?

SQL Parameter Slows Down Query

Use

SELECT * 
FROM Results_CTE
OPTION (RECOMPILE)

SQL Server does not sniff the value of the variable so it has no idea how selective it will be and will probably be assuming that the query will return significantly more rows than is actually the case and giving you a plan optimised for that.

In your case I'm pretty sure that in the good plan you will find it is using a non covering non clustered index to evaluate the PostCode predicate and some lookups to retrieve the missing columns whereas in the bad plan (as it guesses the query will return a greater number of rows) it avoids this in favour of a full table scan.

SQL - any performance difference using constant values vs parameters?

It is important to distinguish between parameters and variables here. Parameters are passed to procedures and functions, variables are declared.

Addressing variables, which is what the SQL in the question has, when compiling an ad-hoc batch, SQL Server compiles each statement within it's own right.
So when compiling the query with a variable it does not go back to check any assignment, so it will compile an execution plan optimised for an unknown variable.
On first run, this execution plan will be added to the plan cache, then future executions can, and will reuse this cache for all variable values.

When you pass a constant the query is compiled based on that specific value, so can create a more optimum plan, but with the added cost of recompilation.

So to specifically answer your question:

However, I seem to recall that if you use constant values in SQL statements that SQL server won't reuse the same query execution plans, or something to that effect that causes worse performance -- but is that actually true?

Yes it is true that the same plan cannot be reused for different constant values, but that does not necessarily cause worse performance. It is possible that a more appropriate plan can be used for that particular constant (e.g. choosing bookmark lookup over index scan for sparse data), and this query plan change may outweigh the cost of recompilation. So as is almost always the case regarding SQL performance questions. The answer is it depends.

For parameters, the default behaviour is that the execution plan is compiled based on when the parameter(s) used when the procedure or function is first executed.

I have answered similar questions before in much more detail with examples, that cover a lot of the above, so rather than repeat various aspects of it I will just link the questions:

  • Does assigning stored procedure input parameters to local variables help optimize the query?
  • Ensure cold cache when running query
  • Why is SQL Server using index scan instead of index seek when WHERE clause contains parameterized values

Performance when adding a parameter to where clause

I think you should rethink this query. Try and avoid the NOT EXISTS() for starters - as thats generally quite inefficient (I usually prefer a LEFT JOIN in these instances - and a corresponding WHERE x IS NULL - the x being something in the right hand side)

The main cause of woe for you though is likely to be the CASE based WHERE - as that is now causing the inner query to be evaluated for EVERY ROW!. I think you'd be better left joining both sets of disqualifying criteria, but include the parameter in the join conditions - and then check that there is nothing on the right hand side of either of the 2 left joined criteria

Heres how I think it could be rewritten:

declare @IsGazEnabled tinyint;

set @IsGazEnabled = 1;

select 'CT Ref: ' + accountreference + ' - Not synced due to missing property ref ' + t.PropertyReference
from CTaxAccountTemp t
left join ccaddress a2 ON t.PropertyReference = a2.PropertyReference and @IsGazEnabled = 0
left join
(
ccaddress a
join w2addresscrossref x on x.UPRN = a.UPRN
and x.appcode in ( -- could make this a join for efficiency....
select w2source
from GazSourceConfig
where GazSource in (
select GazSource
from GazSourceConfig
where W2Source = 'CTAX'
)
union all select 'URB'
)
) ON t.PropertyReference = x.PropertyReference AND and @IsGazEnabled = 1
WHERE
a2.PropertyReference IS NULL
AND x.PropertyReference IS NULL
;


Related Topics



Leave a reply



Submit