What Are the Main Differences Between Option(Optimize for Unknown) and Option(Recompile)

Difference between Option(Optimize For Unknown) and option(Optimize for (@parameter Unknown))

From the docs: (my bold)

OPTIMIZE FOR UNKNOWN

Instructs the Query Optimizer to use the average selectivity of the predicate across all column values instead of using the runtime parameter value when the query is compiled and optimized.

If you use OPTIMIZE FOR @variable_name = literal_constant and OPTIMIZE FOR UNKNOWN in the same query hint, the Query Optimizer will use the literal_constant specified for a specific value. The Query Optimizer will use UNKNOWN for the rest of the variable values. The values are used only during query optimization, and not during query execution.

So OPTIMIZE FOR UNKNOWN will apply OPTIMIZE FOR @variable_name UNKNOWN to each variable that is not already specified with OPTIMIZE FOR

Rule of thumb on when to use WITH RECOMPILE option

As others have said, you don't want to simply include WITH RECOMPILE in every stored proc as a matter of habit. By doing so, you'd be eliminating one of the primary benefits of stored procedures: the fact that it saves the query plan.

Why is that potentially a big deal? Computing a query plan is a lot more intensive than compiling regular procedural code. Because the syntax of a SQL statement only specifies what you want, and not (generally) how to get it, that allows the database a wide degree of flexibility when creating the physical plan (that is, the step-by-step instructions to actually gather and modify data). There are lots of "tricks" the database query pre-processor can do and choices it can make - what order to join the tables, which indexes to use, whether to apply WHERE clauses before or after joins, etc.

For a simple SELECT statement, it might not make a difference, but for any non-trivial query, the database is going to spend some serious time (measured in milliseconds, as opposed to the usual microseconds) to come up with an optimal plan. For really complex queries, it can't even guarantee an optimal plan, it has to just use heuristics to come up with a pretty good plan. So by forcing it to recompile every time, you're telling it that it has to go through that process over and over again, even if the plan it got before was perfectly good.

Depending on the vendor, there should be automatic triggers for recompiling query plans - for example, if the statistics on a table change significantly (like, the histogram of values in a certain column starts out evenly distributed by over time becomes highly skewed), then the DB should notice that and recompile the plan. But generally speaking, the implementers of a database are going to be smarter about that on the whole than you are.

As with anything performance related, don't take shots in the dark; figure out where the bottlenecks are that are costing 90% of your performance, and solve them first.

T-SQL parameter sniffing recompile plan

You could try the OPTIMIZE FOR UNKNOWN hint instead of RECOMPILE:

exec sp_executesql N'SELECT TOP (10) *
FROM mytableView
WHERE ([Name]) LIKE (''%'' + (@Value0) + ''%'')
ORDER BY [Id] DESC
option(OPTIMIZE FOR UNKNOWN);',
N'@Value0 varchar(5)',
@Value0 = 'value';

The MSDN page for Query Hints states that OPTIMIZE FOR UNKNOWN:

Instructs the query optimizer to use statistical data instead of the initial values for all local variables when the query is compiled and optimized, including parameters created with forced parameterization.

This hint instructs the optimizer to use the total number of rows for the specified table divided by the number of distinct values for the specified column (i.e. average rows per value) as the row estimate instead of using the statistics of any particular value. As pointed out by @GarethD in a comment below: since this will possibly benefit some queries and possibly hurt others, it needs to be tested to see if the overall gain from this is a net savings over the cost of doing the RECOMPILE. For more details check out: How OPTIMIZE FOR UNKNOWN Works.

And just to have it stated, depending on the distribution of the data and what values are passed in, if there is a particular value being used that has a distribution that is fairly representative of most of the values that could be passed in (even if wildly different from some values that won't ever be passed in), then you can target that value by using OPTIMIZE FOR (@Value0 = 'representative value') rather than OPTIMIZE FOR UNKNOWN.

Please note that this query hint is only needed for queries that have:

  • parameters supplied by variables
  • the field(s) in question do not have a fairly even distribution of values (and hence different values passed in via the variable could generate different plans)

The following scenarios were identified in comments below and do not all require this hint, so here is how to address each situation:

  • select top 80 * from table order by id desc
    There is no variable being passed in here so no query hint needed.

  • select top 80 * from table where id < @lastid order by id desc
    There is a variable being passed in here, but the [id] field, by its very nature, is evenly distributed, even if sparse due to some deletes, hence no query hint needed (or at least should not be needed).

  • SELECT TOP (10) * FROM mytableView WHERE ([Name]) LIKE (''%'' + (@Value0) + ''%'') ORDER BY [Id] DESC
    There is a variable being passed in here, and used in such a way that there could be no indication of consistent numbers of matching rows for different values, especially due to not being able to use an index as a result of the leading %. THIS is a good opportunity for the OPTION (OPTIMIZE FOR UNKNOWN) hint as discussed above.

  • If there is a situation where a variable is passed in that has a greatly varying distribution of matching rows, but not many possible values to get passed in, and the values that are passed in are re-used frequently, then those can be concatenated (after doing a REPLACE(@var, '''', '''''') on it) directly into the Dynamic SQL. This allows for each of those values to have their own separate yet reusable query plan. Other variables should be sent in as parameters as usual.

    For example, a lookup value for [StatusID] will only have a few possible values and they will get reused frequently but each particular value can match a vastly different number of rows. In that case, something like the following will allow for separate execution plans that do not need either the RECOMPILE or OPTIMIZE FOR UNKNOWN hints as each execution plan will be optimized for that particular value:

    IF (TRY_CONVERT(INT, @StatusID) IS NULL)
    BEGIN
    ;THROW 50505, '@StatusID was not a valid INT', 55;
    END;

    DECLARE @SQL NVARCHAR(MAX);
    SET @SQL = N'SELECT TOP (10) * FROM myTableView WHERE [StatusID] = '
    + REPLACE(@StatusID, N'''', N'''''') -- really only needed for strings
    + N' AND [OtherField] = @OtherFieldVal;';

    EXEC sp_executesql
    @SQL,
    N'@OtherFieldVal VARCHAR(50)',
    @OtherFieldVal = @OtherField;

    Assuming two different values of @StatusID are passed in (e.g. 1 and 2), there will be two execution plans cached matching the following queries:

    • SELECT TOP (10) * FROM myTableView WHERE [StatusID] = 1 AND [OtherField] = @OtherFieldVal;

    • SELECT TOP (10) * FROM myTableView WHERE [StatusID] = 2 AND [OtherField] = @OtherFieldVal;

OPTION (RECOMPILE) is Always Faster; Why?

There are times that using OPTION(RECOMPILE) makes sense. In my experience the only time this is a viable option is when you are using dynamic SQL. Before you explore whether this makes sense in your situation I would recommend rebuilding your statistics. This can be done by running the following:

EXEC sp_updatestats

And then recreating your execution plan. This will ensure that when your execution plan is created it will be using the latest information.

Adding OPTION(RECOMPILE) rebuilds the execution plan every time that your query executes. I have never heard that described as creates a new lookup strategy but maybe we are just using different terms for the same thing.

When a stored procedure is created (I suspect you are calling ad-hoc sql from .NET but if you are using a parameterized query then this ends up being a stored proc call) SQL Server attempts to determine the most effective execution plan for this query based on the data in your database and the parameters passed in (parameter sniffing), and then caches this plan. This means that if you create the query where there are 10 records in your database and then execute it when there are 100,000,000 records the cached execution plan may no longer be the most effective.

In summary - I don't see any reason that OPTION(RECOMPILE) would be a benefit here. I suspect you just need to update your statistics and your execution plan. Rebuilding statistics can be an essential part of DBA work depending on your situation. If you are still having problems after updating your stats, I would suggest posting both execution plans.

And to answer your question - yes, I would say it is highly unusual for your best option to be recompiling the execution plan every time you execute the query.

OPTION (OPTIMIZE FOR (@AccountID=148))

Assume the extreme case where only account 148 covers 10% of the table, and all the other accounts just 0.001% each. Well, given that that account represents 10% of your data, it also stands to reason it will be searched for more often than the other accounts. Now imagine that for any other account, a nested loop join over a small amount of rows would be really fast, but for account 148, it would be hideously slow and a hash join would be the superior choice.

Further imagine that by some stroke of bad luck, the first query that comes in to your system after a reboot/plan recycle is one for an account other than 148. You are now stuck with a plan that performs extremely poorly 10% of the time, even though it also happens to be really good for other data. In this case, you may well want the optimizer to stick with the plan that isn't a disaster 10% of the time, even if that means it's slightly less than optimal the other times. This is where OPTIMIZE FOR comes in.

Alternatives are OPTIMIZE FOR UNKNOWN (if your distribution is far more even, but you need to protect against an accidental plan lock-in from unrepresentative parameters), RECOMPILE (which can have a considerable performance impact for queries executed frequently), explicit hints (FORCESEEK, OPTION HASH JOIN, etcetera), adding fine-grained custom statistics (CREATE STATISTICS) and splitting queries (IF @AccountID = 148 ...), possibly in combination with a filtered index. But aside from all these, OPTIMIZE FOR @Param = <specific non-representative value> certainly has a place.

Option (recompile) speeds up query execution

This doesn't look like parameter sniffing.

The CompiledValue and RunTimeValue is the same for all parameters even in the bad.sqlplan

<ParameterList>
<ColumnReference Column="@p__linq__16" ParameterCompiledValue="(8)" ParameterRuntimeValue="(8)" />
<ColumnReference Column="@p__linq__15" ParameterCompiledValue="N'ABCD4'" ParameterRuntimeValue="N'ABCD4'" />
<ColumnReference Column="@p__linq__14" ParameterCompiledValue="(10776)" ParameterRuntimeValue="(10776)" />
<ColumnReference Column="@p__linq__13" ParameterCompiledValue="(8)" ParameterRuntimeValue="(8)" />
<ColumnReference Column="@p__linq__12" ParameterCompiledValue="(0)" ParameterRuntimeValue="(0)" />
<ColumnReference Column="@p__linq__11" ParameterCompiledValue="N'ABCD4'" ParameterRuntimeValue="N'ABCD4'" />
<ColumnReference Column="@p__linq__10" ParameterCompiledValue="N'ABCD4'" ParameterRuntimeValue="N'ABCD4'" />
<ColumnReference Column="@p__linq__9" ParameterCompiledValue="NULL" ParameterRuntimeValue="NULL" />
<ColumnReference Column="@p__linq__8" ParameterCompiledValue="NULL" ParameterRuntimeValue="NULL" />
<ColumnReference Column="@p__linq__7" ParameterCompiledValue="NULL" ParameterRuntimeValue="NULL" />
<ColumnReference Column="@p__linq__6" ParameterCompiledValue="N'ABCD4'" ParameterRuntimeValue="N'ABCD4'" />
<ColumnReference Column="@p__linq__5" ParameterCompiledValue="(8)" ParameterRuntimeValue="(8)" />
<ColumnReference Column="@p__linq__4" ParameterCompiledValue="(513)" ParameterRuntimeValue="(513)" />
<ColumnReference Column="@p__linq__3" ParameterCompiledValue="(8)" ParameterRuntimeValue="(8)" />
<ColumnReference Column="@p__linq__2" ParameterCompiledValue="(513)" ParameterRuntimeValue="(513)" />
<ColumnReference Column="@p__linq__1" ParameterCompiledValue="(8)" ParameterRuntimeValue="(8)" />
<ColumnReference Column="@p__linq__0" ParameterCompiledValue="(513)" ParameterRuntimeValue="(513)" />
</ParameterList>

Rather it looks as though you are benefitting from The Parameter Embedding Optimization.

The query text is cut off in the plan but I can see places where you are comparing parameters with literals. Perhaps this is a catch all query

Sample Image

When you use OPTION (RECOMPILE) the plan compiled only has to work for the passed parameter values. SQL Server can look at the value of the parameters passed and effectively replace the highlighted expressions with TRUE or FALSE.

This potentially allows it to simplify out whole branches of the plan if they are not relevant to the parameter values being passed.

A simple example of this would be

EXEC sys.sp_executesql
N'SELECT * FROM master..spt_values WHERE @Param <> 0 OPTION (RECOMPILE)',
N'@Param INT',
@Param = 0;

The plan is compiled when @Param=0 and it only has to be correct for this value so the @Param <> 0 can be evaluated at compile time as false and the plan doesn't access the table at all.

Sample Image

Without the OPTION (RECOMPILE) you see this

EXEC sys.sp_executesql
N'SELECT * FROM master..spt_values WHERE @Param <> 0',
N'@Param INT',
@Param = 0;

Sample Image

Even though the sniffed parameter value and the runtime parameter value are the same the plan cannot be optimised to the same extent as it will be cached and still need to work if passed a different parameter value.

EF 6 Parameter Sniffing

It's possible to use the interception feature of EF6 to manipulate its internal SQL commands before executing them on DB, for instance adding option(recompile) at the end of the command:

public class OptionRecompileHintDbCommandInterceptor : IDbCommandInterceptor
{
public void NonQueryExecuting(DbCommand command, DbCommandInterceptionContext<Int32> interceptionContext)
{
}

public void NonQueryExecuted(DbCommand command, DbCommandInterceptionContext<int> interceptionContext)
{
}

public void ReaderExecuted(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
{
}

public void ReaderExecuting(DbCommand command, DbCommandInterceptionContext<DbDataReader> interceptionContext)
{
addQueryHint(command);
}

public void ScalarExecuted(DbCommand command, DbCommandInterceptionContext<object> interceptionContext)
{
}

public void ScalarExecuting(DbCommand command, DbCommandInterceptionContext<object> interceptionContext)
{
addQueryHint(command);
}

private static void addQueryHint(IDbCommand command)
{
if (command.CommandType != CommandType.Text || !(command is SqlCommand))
return;

if (command.CommandText.StartsWith("select", StringComparison.OrdinalIgnoreCase) && !command.CommandText.Contains("option(recompile)"))
{
command.CommandText = command.CommandText + " option(recompile)";
}
}
}

To use it, add the following line at the beginning of the application:

DbInterception.Add(new OptionRecompileHintDbCommandInterceptor());

Searching with a lateral join extremely slow

Based on just this part of your question, I am confident you have a parameter sniffing problem.

my query goes from running in 1 second or less to running in about 16 seconds

Please add OPTION (RECOMPILE) to the end of query and see what happens (But please investigate the implications of this and understand the CPU consequences). You can also look at OPTIMIZE FOR UNKNOWN


Going in blind, a few ideas that you could try

  • Reduce 1 lookup to Children i.e. WHERE c.datecreated = (select max(c1.datecreated) from children c1 where c1.ParentId = c.ParentId)
  • Clean up the @Search logic by reducing it
SELECT p.id,
p.name,
c.firstname,
c.lastname
FROM Parents P
INNER JOIN
(
select c1.firstname,
c1.lastname,
c1.ParentId,
row_number() OVER (PARTITION BY c1.ParentId ORDER BY c1.datecreated DESC) as RN
from children c1
) as c
ON c.ParentId = p.Id
AND c.RN = 1 --/*Get the Latest First,Lastname based on datecreated*/
INNER JOIN Users s
ON p.UserId = s.Id
WHERE 1 = 1
AND (
c.firstname = @search
OR c.lastname = @search
OR p.name = @search
-- if @search is either NULL / '' it will return
OR NULLIF(@search, '') IS NULL
)
--OPTION (RECOMPILE) /*Uncomment this and see does it improve*/

Also just beaware that you could be running into a Parameter Sniffing problem i.e. since @search is a query parameter and it looks like it can vary a lot.

Can you try adding OPTION RECOMPILE to the end of your query and seeing if that makes a difference

Also things to try if you haven't done so already.

  • update statistics on tables
  • defrag/rebuild indexes
  • look at implementing ola.hellengren maintenance plans

And finally, to get some more help can you paste the query plan using https://www.brentozar.com/pastetheplan/



Related Topics



Leave a reply



Submit