SQL Server 2008 Stored proc - Optimizer thinks my parameter is nullable
Re: Nullable variables
There is no concept of nullable for variables in T-SQL, the way that you can define a variable as nullable in c# using the ?.
If you have a parameter in a stored procedure, the end user can pass whatever he or she wants into the stored procedure, be it a real value or a null.
Re: the query plan
The query plan that will get cached is the query plan that gets generated upon the first time you call this stored procedure.. so if you passed in a null for @FooValue the very first time you ran it, then it will be optimized for @FooValue = null.
There is an OPTIMIZE FOR hint that you can use to optimize the query for some other value:
Or you can use WITH RECOMPILE, which will force the query plan to get regenerated on every run of the stored procedure.
Obviously there are trade-offs when using these types of hints, so make sure you understand them before using them.
Why does SQL Server query optimizer sometimes overlook obvious clustered primary key?
1) In my opinion, the key point here is that for clustered tables (tables which have a clustered index = the main data structure = is that data structure that store table data = clustered index is the table itself) the every non-clustered index include also the the key of clustered index. This means that
CREATE [UNIQUE] NONCLUSTERED INDEX bla
ON [dbo].[msgr] (uid)
is basically the same thing as
CREATE [UNIQUE] NONCLUSTERED INDEX bla
ON [dbo].[msgr] (uid)
INCLUDE (id) -- id = key of clustered index
So, for such tables, every record from non-clustered indexes on the leaf pages includes also the key of clustered index. This way, within every non-clustered index and for every leaf record SQL Server store also some kind of pointer to the main data structure.
2) This means that SELECT COUNT(id) FROM dbo.msgr
can be executed using CI but also using NCI because both indexes include the id
(key of clustered index) column.
As a secondary note within this topic, because IDENTITY
property (for id
column) means a mandatory column (NOT NULL
), COUNT(id)
is the same thing as COUNT(*)
. Also, this means that COUNT(msg_id)
(also a mandatory / NOT NULL
) column is the same thing as COUNT(*)
. So, it's very likely that execution plan for SELECT COUNT(msg_id) FROM dbo.msgr
will use the same NCI (for example bla
).
3) Non-clustered indexes have smaller size than clustered index. This means also less IO => It's better from performance point of view to use the NCI than CI.
I would do following simple test:
SET STATISTICS IO ON;
GO
SELECT COUNT(id)
FROM dbo.[dbo].[msgr] WITH(INDEX=[bla]) -- It forces usage of NCI
GO
SELECT COUNT(id)
FROM dbo.[dbo].[msgr] WITH(INDEX=[PK_msgr]) -- It forces usage of CI
GO
SET STATISTICS IO OFF;
GO
If there is a lot of data within msgr
table then STATISTICS IO
will show different LIO
(logical IO), with less LIO for NCI query.
Different execution plan when executing statement directly and from stored procedure
This generally has something to do with parameter sniffing. It can be very frustrating to deal with. Sometimes it can be solved by recompiling the stored procedure, and sometimes you can even use a duplicate variable inside the stored procedure like this:
alter procedure p_myproc (@p1 int) as
declare @p1_copy int;
set @p1_copy = @p1;
And then use @p1_copy in the query. Seems ridiculous but it works.
Check my recent question on the same topic:
Why does the SqlServer optimizer get so confused with parameters?
SQL Parameter Slows Down Query
Use
SELECT *
FROM Results_CTE
OPTION (RECOMPILE)
SQL Server does not sniff the value of the variable so it has no idea how selective it will be and will probably be assuming that the query will return significantly more rows than is actually the case and giving you a plan optimised for that.
In your case I'm pretty sure that in the good plan you will find it is using a non covering non clustered index to evaluate the PostCode
predicate and some lookups to retrieve the missing columns whereas in the bad plan (as it guesses the query will return a greater number of rows) it avoids this in favour of a full table scan.
SQL - any performance difference using constant values vs parameters?
It is important to distinguish between parameters and variables here. Parameters are passed to procedures and functions, variables are declared.
Addressing variables, which is what the SQL in the question has, when compiling an ad-hoc batch, SQL Server compiles each statement within it's own right.
So when compiling the query with a variable it does not go back to check any assignment, so it will compile an execution plan optimised for an unknown variable.
On first run, this execution plan will be added to the plan cache, then future executions can, and will reuse this cache for all variable values.
When you pass a constant the query is compiled based on that specific value, so can create a more optimum plan, but with the added cost of recompilation.
So to specifically answer your question:
However, I seem to recall that if you use constant values in SQL statements that SQL server won't reuse the same query execution plans, or something to that effect that causes worse performance -- but is that actually true?
Yes it is true that the same plan cannot be reused for different constant values, but that does not necessarily cause worse performance. It is possible that a more appropriate plan can be used for that particular constant (e.g. choosing bookmark lookup over index scan for sparse data), and this query plan change may outweigh the cost of recompilation. So as is almost always the case regarding SQL performance questions. The answer is it depends.
For parameters, the default behaviour is that the execution plan is compiled based on when the parameter(s) used when the procedure or function is first executed.
I have answered similar questions before in much more detail with examples, that cover a lot of the above, so rather than repeat various aspects of it I will just link the questions:
- Does assigning stored procedure input parameters to local variables help optimize the query?
- Ensure cold cache when running query
- Why is SQL Server using index scan instead of index seek when WHERE clause contains parameterized values
Performance when adding a parameter to where clause
I think you should rethink this query. Try and avoid the NOT EXISTS()
for starters - as thats generally quite inefficient (I usually prefer a LEFT JOIN
in these instances - and a corresponding WHERE x IS NULL
- the x being something in the right hand side)
The main cause of woe for you though is likely to be the CASE based WHERE - as that is now causing the inner query to be evaluated for EVERY ROW!. I think you'd be better left joining both sets of disqualifying criteria, but include the parameter in the join conditions - and then check that there is nothing on the right hand side of either of the 2 left joined criteria
Heres how I think it could be rewritten:
declare @IsGazEnabled tinyint;
set @IsGazEnabled = 1;
select 'CT Ref: ' + accountreference + ' - Not synced due to missing property ref ' + t.PropertyReference
from CTaxAccountTemp t
left join ccaddress a2 ON t.PropertyReference = a2.PropertyReference and @IsGazEnabled = 0
left join
(
ccaddress a
join w2addresscrossref x on x.UPRN = a.UPRN
and x.appcode in ( -- could make this a join for efficiency....
select w2source
from GazSourceConfig
where GazSource in (
select GazSource
from GazSourceConfig
where W2Source = 'CTAX'
)
union all select 'URB'
)
) ON t.PropertyReference = x.PropertyReference AND and @IsGazEnabled = 1
WHERE
a2.PropertyReference IS NULL
AND x.PropertyReference IS NULL
;
Related Topics
SQL Left Join First Match Only
SQL Query to Collapse Duplicate Values by Date Range
Store Select Query's Output in One Array in Postgres
Query to Check Index on a Table
Difference Between Different Types of SQL
Inserting Text String with Hex into Postgresql as a Bytea
Finding Rows with Same Values in Multiple Columns
How to Sort the Result from String_Agg()
How to Drop Multiple Tables in Postgresql Using a Wildcard
How to Add a Not Null Column Without Default Value
How to Get SQL Error in Stored Procedure
Visiting a Directed Graph as If It Were an Undirected One, Using a Recursive Query
Doing "Points of Interest Along a Route" in Google Maps
Oracle: Getting Maximum Value of a Group
SQL Server, Converting Ntext to Nvarchar(Max)
Oracle "Ora-01008: Not All Variables Bound" Error W/ Parameters