Why Do SQL Server Scalar-Valued Functions Get Slower

Why do SQL Server Scalar-valued functions get slower?

In most cases, it's best to avoid scalar valued functions that reference tables because (as others said) they are basically black boxes that need to be ran once for every row, and cannot be optimized by the query plan engine. Therefore, they tend to scale linearly even if the associated tables have indexes.

You may want to consider using an inline-table-valued function, since they are evaluated inline with the query, and can be optimized. You get the encapsulation you want, but the performance of pasting the expressions right in the select statement.

As a side effect of being inlined, they can't contain any procedural code (no declare @variable; set @variable = ..; return). However, they can return several rows and columns.

You could re-write your functions something like this:

create function usf_GIS_GET_LAT(
@City varchar (30),
@State char (2)
)
returns table
as return (
select top 1 lat
from GIS_Location with (nolock)
where [State] = @State
and [City] = @City
);

GO

create function usf_GIS_GET_LON (
@City varchar (30),
@State char (2)
)
returns table
as return (
select top 1 LON
from GIS_Location with (nolock)
where [State] = @State
and [City] = @City
);

The syntax to use them is also a little different:

select
Lat.Lat,
Lon.Lon
from
Address_Location with (nolock)
cross apply dbo.usf_GIS_GET_LAT(City,[State]) AS Lat
cross apply dbo.usf_GIS_GET_LON(City,[State]) AS Lon
WHERE
ID IN (SELECT TOP 100 ID FROM Address_Location WITH(NOLOCK) ORDER BY ID DESC)

Set based plan runs slower than scalar valued function with many conditions

The keyword term here is INLINE TABLE VALUED FUNCTIONS. You have two types of T-SQL tabled valued functions: multi-statement and inline. If your T-SQL function starts with a BEGIN statement then it's going to be crap - scalar or otherwise. You can't get a temp table into an inline table valued function so I'm assuming you went from scalar to mutli-statement table valued function which will probably be worse.

Your inline table valued function (iTVF) should look something like this:

CREATE FUNCTION [dbo].[Compute_value]
(
@alpha FLOAT,
@bravo FLOAT,
@charle FLOAT,
@delta FLOAT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT newValue =
CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
ELSE @alpha * POWER((100 / @delta),
(-2 * POWER(@charle * @bravo, DATEDIFF(<unit of measurement>,GETDATE(),'1/1/2000')/365)))
END
GO;

Note that, in the code you posted, your DATEDIFF statement is missing the datepart parameter. If should look something like:

@x int = DATEDIFF(DAY, GETDATE(),'1/1/2000')   

Going a little further - it's important to understand why iTVF's are better than T-SQL scalar valued user-defined functions. It's not because table valued functions are faster than scalar valued functions, it's because Microsoft's implementation of T-SQL inline functions are faster than their implementation of T-SQL functions that are not inline. Note the following three functions that do the same thing:

-- Scalar version
CREATE FUNCTION dbo.Compute_value_scalar
(
@alpha FLOAT,
@bravo FLOAT,
@charle FLOAT,
@delta FLOAT
)
RETURNS FLOAT
AS
BEGIN
IF @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0
RETURN 0

IF @bravo IS NULL OR @bravo <= 0
RETURN 100

IF (@charle + @delta) / @bravo <= 0
RETURN 100
DECLARE @x int = DATEDIFF(dd, GETDATE(),'1/1/2000')
RETURN @alpha * POWER((100 / @delta), (-2 * POWER(@charle * @bravo, @x/365)))
END
GO

-- multi-statement table valued function
CREATE FUNCTION dbo.Compute_value_mtvf
(
@alpha FLOAT,
@bravo FLOAT,
@charle FLOAT,
@delta FLOAT
)
RETURNS @sometable TABLE (newValue float) AS
BEGIN
INSERT @sometable VALUES
(
CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
ELSE @alpha * POWER((100 / @delta),
(-2 * POWER(@charle * @bravo, DATEDIFF(DAY,GETDATE(),'1/1/2000')/365)))
END
)
RETURN;
END
GO

-- INLINE table valued function
CREATE FUNCTION dbo.Compute_value_itvf
(
@alpha FLOAT,
@bravo FLOAT,
@charle FLOAT,
@delta FLOAT
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT newValue =
CASE WHEN @alpha IS NULL OR @alpha = 0 OR @delta IS NULL OR @delta = 0 THEN 0
WHEN @bravo IS NULL OR @bravo <= 0 THEN 100
ELSE @alpha * POWER((100 / @delta),
(-2 * POWER(@charle * @bravo, DATEDIFF(DAY,GETDATE(),'1/1/2000')/365)))
END
GO

Now for some sample data and performance test:

SET NOCOUNT ON;
CREATE TABLE #someTable (alpha FLOAT, bravo FLOAT, charle FLOAT, delta FLOAT);
INSERT #someTable
SELECT TOP (100000)
abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1,
abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1
FROM sys.all_columns a, sys.all_columns b;

PRINT char(10)+char(13)+'scalar'+char(10)+char(13)+replicate('-',60);
GO
DECLARE @st datetime = getdate(), @z float;

SELECT @z = dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
FROM #someTable t;

PRINT DATEDIFF(ms, @st, getdate());
GO

PRINT char(10)+char(13)+'mtvf'+char(10)+char(13)+replicate('-',60);
GO
DECLARE @st datetime = getdate(), @z float;

SELECT @z = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_mtvf(t.alpha, t.bravo, t.charle, t.delta) f;

PRINT DATEDIFF(ms, @st, getdate());
GO

PRINT char(10)+char(13)+'itvf'+char(10)+char(13)+replicate('-',60);
GO
DECLARE @st datetime = getdate(), @z float;

SELECT @z = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_itvf(t.alpha, t.bravo, t.charle, t.delta) f;

PRINT DATEDIFF(ms, @st, getdate());
GO

Results:

scalar
------------------------------------------------------------
2786

mTVF
------------------------------------------------------------
41536

iTVF
------------------------------------------------------------
153

The scalar udf ran for 2.7 seconds, 41 seconds for the mtvf and 0.153 seconds for the iTVF. To understand why let's look at the estimated execution plans:

Sample Image

You don't see this when you look at the actual execution plan but, with the scalar udf and mtvf, the optimizer calls some poorly executed subroutine for each row; the iTVF does not. Quoting Paul White's career changing article about APPLY Paul writes:

You might find it useful to think of an iTVF as a view that accepts
parameters. Just as for views, SQL Server expands the definition of an
iTVF directly into the query plan of an enclosing query, before
optimization is performed.

The effect is that SQL Server is able to apply its full range of
optimizations, considering the query as a whole. It is just as if you
had written the expanded query out by hand....

In other words, iTVF's enable to optimizer to optimize the query in ways that just aren't possible when all that other code needs to be executed. One of many other examples of why iTVFs are superior is they are the only one of the three aforementioned function types that allow parallelism. Let's run each function one more time, this time with the Actual Execution plan turned on and with traceflag 8649 (which forces a parallel execution plan):

-- don't need so many rows for this test
TRUNCATE TABLE #sometable;
INSERT #someTable
SELECT TOP (10)
abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1,
abs(checksum(newid())%10)+1, abs(checksum(newid())%10)+1
FROM sys.all_columns a;

DECLARE @x float;

SELECT TOP (10) @x = dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
FROM #someTable t
ORDER BY dbo.Compute_value_scalar(t.alpha, t.bravo, t.charle, t.delta)
OPTION (QUERYTRACEON 8649);

SELECT TOP (10) @x = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_mtvf(t.alpha, t.bravo, t.charle, t.delta) f
ORDER BY f.newValue
OPTION (QUERYTRACEON 8649);

SELECT @x = f.newValue
FROM #someTable t
CROSS APPLY dbo.Compute_value_itvf(t.alpha, t.bravo, t.charle, t.delta) f
ORDER BY f.newValue
OPTION (QUERYTRACEON 8649);

Execution plans:

Sample Image

Those arrows you see for the iTVF's execution plan is parallelism - all your CPU's (or as many as your SQL instance's MAXDOP settings allow) working together. T-SQL scalar and mtvf UDFs can't do that. When Microsoft introduces inline scalar UDFs then I'd suggest those for what you're doing but, until then: if performance is what you're looking for then inline is the only way to go and, for that, iTVFs are the only game in town.

Note that I continuously emphasized T-SQL when talking about functions... CLR Scalar and Table valued functions can be just fine but that's a different topic.

Why do Scalar-valued functions in the where clause get slower?

A CTE is often no more than syntactical sugar, SQL Server combines the whole query into a single execution plan that it thinks best.

You can sometimes work around this by adding the function call in the CTE as a column and then using it in your where clause.

with cte as (
select selectStations.ReceiverUID
, 1907 as id
, dbo.fnGetGkExchangeParticipantQuick(selectStations.ReceiverUID, 1907) as ExchangeParticipant
from WF4_Routes r WITH (NOLOCK)
inner join WF4_Stages selectStages WITH (NOLOCK) on selectStages.RouteID = r.ID and r.SentToCAS = 1 and r.PRUZ <> 1 and selectStages.PRUZ <> 1 and selectStages.StageType = 1
inner join WF4_Stations selectStations WITH (NOLOCK) on selectStations.ApprovalStageID = selectStages.ID and selectStations.PRUZ <> 1
)
select *
from cte
where ExchangeParticipant = 'E477B8FA-7539-4B43-8961-807A29FECFC0';

Notes:

You should never use a scalar function in a where clause as indexes can't be used and frequently a full table scan is required.

And if you need to force the CTE part of the query to execute first then you can materialise the results into a temp table and then select from that. This way SQL Server creates two independent execution plans.

Why do simple function calls in a SELECT statement slow down my query SO much?

scalar valued functions are generally bad practice in terms of performance. Lets say that you have function

CREATE FUNCTION fn_GetName(@CustomerID int)
RETURNS varchar(100)
AS
RETURN (
DECLARE @myResult VARCHAR(100);
SELECT @myResult = Fullname
FROM Sales s
WHERE s.CustomerID = @CustomerID
RETURN @myResult
)

and lets say that we are calling this function like

select 
fn_GetName(id)
from Student;

SQL is interpreting this function row by row which causes performance bottleneck. However Table-valued functions does not do row-by-row operations,

CREATE FUNCTION fn_GetNameTable(@CustomerID int)
RETURNS TABLE
AS
RETURN (
SELECT Fullname
FROM Sales s
WHERE s.CustomerID = @CustomerID
)

Then,

SELECT I.[Customer Name]
,S.CustomerType
FROM Sales s
CROSS APPLY fn_GetNameTable(S.CustomerID) I

is SQL native.

You can read more from This addess

Why does this query become drastically slower when wrapped in a TVF?

I isolated the problem to one line in the query. Keeping in mind that the query is 160 lines long, and I'm including the relevant tables either way, if I disable this line from the SELECT clause:

COALESCE(V.Visits, 0) * COALESCE(ACS.AvgClickCost, GAAC.AvgAdCost, 0.00)

...the run time drops from 63 minutes to five seconds (inlining a CTE has made it slightly faster than the original seven-second query). Including either ACS.AvgClickCost or GAAC.AvgAdCost causes the run time to explode. What makes it especially odd is that these fields come from two subqueries which have, respectively, ten rows and three! They each run in zero seconds when run independently, and with the row counts being so short I would expect the join time to be trivial even using nested loops.

Any guesses as to why this seemingly-harmless calculation would throw off a TVF completely, while it runs very quickly as a stand-alone query?

Scalar-Valued Function Slower than Expected

Expressions on Columns

First, no matter what function you use, putting it in the WHERE clause or in a JOIN condition on a column in the table is suboptimal. Do the math on a constant and compare. Your WHERE clause should look like this:

ff.startDate < DateAdd(day, 1, @curdate) -- if @curdate has time portion removed
ff.startDate < DateAdd(day, 1, dbo.fn_GetDateOnly(@curdate)) -- if @curdate has time

For generically finding items on a given date, use this pattern:

WHERE
DateCol >= '20120901'
AND DateCol < '20120902'

Put any functions on the opposite side of the equal sign as the column, which should be alone. It may help you to look up how to make an expression SARGable. If a column has to be on both sides, put all the expressions on the side that is the "left" input in the execution plan (its data comes first, is the outer loop of a LOOP JOIN or the "table" side of a HASH JOIN). For example, if you're trying to do this:

WHERE dbo.fn_getDateOnly(A.DateCol) = dbo.fn_getDateOnly(B.DateCol)

then assuming A.DateCol comes first in the execution plan, switch it to:

WHERE
B.DateCol >= DateAdd(day, DateDiff(day, 0, A.DateCol), 0)
AND B.DateCol < DateAdd(day, DateDiff(day, 0, A.DateCol), 0)

(Or use the inline version of the function below, but I find it just as awkward so no extra value to being indirect).

If this kind of querying is going to be done frequently on the tables involved, then some redesign is probably in order, either to add a persisted computed column having the time portion removed (that is possibly indexed), to split the datetime into separate date and time fields, or to store it as simply a date to begin with (if you really don't need datetime).

Note: of course, the references to dbo.fn_getDateOnly can be simply replaced with DateAdd(day, DateDiff(day, 0, DateCol), 0). Additionally, if the value has the datetime data type, you can just do DateCol + 1 rather than using DateAdd (though be careful, since this won't work on the date data type in SQL 2008 and up).

The Inline-ability of UDFs

As for the function's performance in the SELECT clause, SQL Server only inlines "inline functions", not scalar functions. Change your function to return a single-row recordset CREATE FUNCTION ... RETURNS TABLE AS RETURN (SELECT ...) and use it like so:

SELECT
(SELECT DateValue FROM dbo.fn_GetDateOnly(Col)),
...

It really is no different than a parameterized view.

Using this inline version in the WHERE clause is going to be clumsy. It is almost simply better to do the DateDiff. In SQL 2008, of course just use Convert(date, DateCol) but still follow the rules about where to put calculation expressions (on the opposite side of the equal sign from the column).

Be sure to vote for inline scalar UDFs on Microsoft Connect! You're far from the only one who thinks this functionality is sorely lacking.



Related Topics



Leave a reply



Submit