When Would You Use a Table-Valued Function

When would you use a table-valued function?

Table-valued functions are "just" parameterized views. This makes them extremely powerful for encapsulating logic that would otherwise be hidden behind an opaque stored procedure. Here's an example:

Inline Table-valued Function:

create function dbo.GetClients (
@clientName nvarchar(max) = null
)
returns table
return (
select *
from dbo.Clients as a
where ((a.ClientName = @clientName) or a.ClientName is null)
);

Stored Procedure:

create procedure dbo.usp_GetClients (
@clientName nvarchar(max) = null
)
as
begin;
select *
from dbo.Clients as a
where ((a.ClientName = @clientName) or a.ClientName is null)
end;

Unlike the stored procedure call, a table-valued function allows me to compose the logic from dbo.GetClients with other objects:

select *
from dbo.GetClients(N'ACME') as a
join ... as b
on a.ClientId = b.ClientId

In such situations I cannot imagine using a stored procedure because of how restrictive it is when compared to the table-valued function. I would be forced to marshal the data around myself using a temp table, table variable, or application layer in order to combine results from multiple objects.

Inline table-valued functions are especially awesome because of the "inline" bit which is probably best explained here. This allows the optimizer to treat such functions no differently than the objects they encapsulate, resulting in near optimal performance plans (assuming that your indexes and statistics are ideal).

Does Table-Valued Function (SQL) create table on each call? [performance]

User defined, table-valued functions that are composed of multiple statements, as the one you found is, will create an object in the tempdb system database, populate it and then dispose of it when the object goes out of scope.

If you want to run this multiple times over the same parameters, you might consider creating a table variable and caching the result in that yourself. If you're going to be calling it on different lists on comma-separated values though, there's not a great way of avoiding the overhead. SQL Server isn't really built for lots of string manipulation.

Generally, for one-off jobs, the performance implications of this tempdb usage is not going to be a major concern for you. It's more concerning when it's a common pattern in the day-to-day of the database life.

I'd suggest trying, if you can, on a suitably sized subset of the data to gauge the performance of your solution.

Since you say you're on SQL Server 2016, you can make use of the new STRING_SPLIT function, something like

SELECT t.Column1, t.Column2, s.value
FROM table t
CROSS APPLY STRING_SPLIT(t.CsvColumn, ',') s

May get you close to where you want, without the need to define a new function. Note, your database needs to be running under the 2016 compatibility level (130) for this to be available, simply running on SQL 2016 isn't enough (they often do this with new features to avoid the risk of backwards-compatibility-breaking changes).

How to use a table valued function in select clause?

You would use a CROSS APPLY:

SELECT a.rawData, b.*
FROM TableA a
CROSS APPLY FunctionA(a.rawdata) b

Use a table-valued function in a SELECT column list?

You can cross apply a TVF to a regular table e.g.

SELECT
x.a
, x.b
, x.d
, y.*
FROM x
cross apply FN(x.c) y;

Table Valued Function with Recursive CTE

The solution is to use a multi-statement Table Valued Function:

DROP FUNCTION IF EXISTS dates;
GO
CREATE FUNCTION dates(@start date, @end date) RETURNS @dates TABLE(date date) AS
BEGIN
WITH cte AS (
SELECT @start AS date
UNION ALL
SELECT dateadd(day,1,date) FROM cte WHERE date<@end
)
INSERT INTO @dates(date)
SELECT date FROM cte OPTION(MAXRECURSION 0)
RETURN;
END;
GO
SELECT * FROM dates('2020-01-01','2021-01-01');

Inline Table Valued Functions are literally inlined, and clauses such as OPTION can only appear at the very end of an SQL statement, which is not necessarily at the end of the inline function.

On the other hand, a Multi Statement Table Valued function is truly self-contained, so the OPTION clause is OK there.

Difference between scalar, table-valued, and aggregate functions in SQL server?

Scalar Functions

Scalar functions (sometimes referred to as User-Defined Functions / UDFs) return a single value as a return value, not as a result set, and can be used in most places within a query or SET statement, except for the FROM clause (and maybe other places?). Also, scalar functions can be called via EXEC, just like Stored Procedures, though there are not many occasions to make use of this ability (for more details on this ability, please see my answer to the following question on DBA.StackExchange: Why scalar valued functions need execute permission rather than select?). These can be created in both T-SQL and SQLCLR.

  • T-SQL (UDF):

    • Prior to SQL Server 2019: these scalar functions are typically a performance issue because they generally run for every row returned (or scanned) and always prohibit parallel execution plans.
    • Starting in SQL Server 2019: certain T-SQL scalar UDFs can be inlined, that is, have their definitions placed directly into the query such that the query does not call the UDF (similar to how iTVFs work (see below)). There are restrictions that can prevent a UDF from being inlineable (if that wasn't a word before, it is now), and UDFs that can be inlined will not always be inlined due to several factors. This feature can be disabled at the database, query, and individual UDF levels. For more information on this really cool new feature, please see: Scalar UDF Inlining (be sure to review the "requirements" section).
  • SQLCLR (UDF): these scalar functions also typically run per each row returned or scanned, but there are two important benefits over T-SQL UDFs:

    • Starting in SQL Server 2012, return values can be constant-folded into the execution plan IF the UDF does not do any data access, and if it is marked IsDeterministic = true. In this case the function wouldn't run per each row.
    • SQLCLR scalar functions can work in parallel plans ( ) if they do not do any database access.

Table-Valued Functions

Table-Valued Functions (TVFs) return result sets, and can be used in a FROM clause, JOIN, or CROSS APPLY / OUTER APPLY of any query, but unlike simple Views, cannot be the target of any DML statements (INSERT / UPDATE / DELETE). These can also be created in both T-SQL and SQLCLR.

  • T-SQL MultiStatement (TVF): these TVFs, as their name implies, can have multiple statements, similar to a Stored Procedure. Whatever results they are going to return are stored in a Table Variable and returned at the very end; meaning, nothing is returned until the function is done processing. The estimated number of rows that they will return, as reported to the Query Optimizer (which impacts the execution plan) depends on the version of SQL Server:

    • Prior to SQL Server 2014: these always report 1 (yes, just 1) row.
    • SQL Server 2014 and 2016: these always report 100 rows.
    • Starting in SQL Server 2017: default is to report 100 rows, BUT under some conditions the row count will be fairly accurate (based on current statistics) thanks to the new Interleaved Execution feature.
  • T-SQL Inline (iTVF): these TVFs can only ever be a single statement, and that statement is a full query, just like a View. And in fact, Inline TVFs are essentially a View that accepts input parameters for use in the query. They also do not cache their own query plan as their definition is placed into the query in which they are used (unlike the other objects described here), hence they can be optimized much better than the other types of TVFs ( ). These TVFs perform quite well and are preferred if the logic can be handled in a single query.

  • SQLCLR (TVF): these TVFs are similar to T-SQL MultiStatement TVFs in that they build up the entire result set in memory (even if it is swap / page file) before releasing all of it at the very end. The estimated number of rows that they will return, as reported to the Query Optimizer (which impacts the execution plan) is always 1000 rows. Given that a fixed row count is far from ideal, please support my request to allow for specifying the row count: Allow TVFs (T-SQL and SQLCLR) to provide user-defined row estimates to query optimizer

  • SQLCLR Streaming (sTVF): these TVFs allow for complex C# / VB.NET code just like regular SQLCLR TVFs, but are special in that they return each row to the calling query as they are generated ( ). This model allows the calling query to start processing the results as soon as the first one is sent so the query doesn't need to wait for the entire process of the function to complete before it sees any results. And it requires less memory since the results aren't being stored in memory until the process completes. The estimated number of rows that they will return, as reported to the Query Optimizer (which impacts the execution plan) is always 1000 rows. Given that a fixed row count is far from ideal, please support my request to allow for specifying the row count: Allow TVFs (T-SQL and SQLCLR) to provide user-defined row estimates to query optimizer

Aggregate Functions

User-Defined Aggregates (UDA) are aggregates similar to SUM(), COUNT(), MIN(), MAX(), etc. and typically require a GROUP BY clause. These can only be created in SQLCLR, and that ability was introduced in SQL Server 2005. Also, starting in SQL Server 2008, UDAs were enhanced to allow for multiple input parameters ( ). One particular deficiency is that there is no knowledge of row ordering within the group, so creating a running total, which would be relatively easy if ordering could be guaranteed, is not possible within a SAFE Assembly.


Please also see:

  • CREATE FUNCTION (MSDN documentation)
  • CREATE AGGREGATE (MSDN documentation)
  • CLR Table-Valued Function Example with Full Streaming (STVF / TVF) (article I wrote)


Related Topics



Leave a reply



Submit