Why Is a Udf So Much Slower Than a Subquery

Why is a UDF so much slower than a subquery?

The UDF is a black box to the query optimiser so it's executed for every row.
You are doing a row-by-row cursor. For each row in an asset, look up an id three times in another table. This happens when you use scalar or multi-statement UDFs (In-line UDFs are simply macros that expand into the outer query)

One of many articles on the problem is "Scalar functions, inlining, and performance: An entertaining title for a boring post".

The sub-queries can be optimised to correlate and avoid the row-by-row operations.

What you really want is this:

SELECT
uc.id AS creator,
uu.id AS updater,
uo.id AS owner,
a.[name]
FROM
asset a
JOIN
user uc ON uc.user_pk = a.created_by
JOIN
user uu ON uu.user_pk = a.updated_by
JOIN
user uo ON uo.user_pk = a.owned_by

Update Feb 2019

SQL Server 2019 starts to fix this problem.

Why Is My Inline Table UDF so much slower when I use variable parameters rather than constant parameters?

The responses I got were good, and I learned from them, but I think I've found an answer that satisfies me.

I do think it's the use of the PARTITION BY clause that is causing the problem here. I reformulated the UDF using a variant of the self-join idiom:

SELECT t1.A, t1.B, t1.C
FROM T t1
INNER JOIN
(
SELECT A, MAX(C) AS C
FROM T
GROUP BY A
) t2 ON t1.A = t2.A AND t1.C = t2.C

Ironically, this is more performant than using the SQL 2008-specific query, and also the optimizer doesn't have a problem with joining this version of the query using variables rather than constants. At this point, I'm concluding that the optimizer just doesn't handle the more recent SQL extensions as well as the older stuff. As a bonus, I can make use of the UDF now, in my pre-upgraded SQL 2000 platforms.

Thanks for your help, everyone!

Why does this query become drastically slower when wrapped in a TVF?

I isolated the problem to one line in the query. Keeping in mind that the query is 160 lines long, and I'm including the relevant tables either way, if I disable this line from the SELECT clause:

COALESCE(V.Visits, 0) * COALESCE(ACS.AvgClickCost, GAAC.AvgAdCost, 0.00)

...the run time drops from 63 minutes to five seconds (inlining a CTE has made it slightly faster than the original seven-second query). Including either ACS.AvgClickCost or GAAC.AvgAdCost causes the run time to explode. What makes it especially odd is that these fields come from two subqueries which have, respectively, ten rows and three! They each run in zero seconds when run independently, and with the row counts being so short I would expect the join time to be trivial even using nested loops.

Any guesses as to why this seemingly-harmless calculation would throw off a TVF completely, while it runs very quickly as a stand-alone query?

SQL Alternatives to Slow UDF

Using joins instead of "IN" clause helped a great deal. (Though I also changed the table var to a temp table and that too helped significantly.)

UDF Performance in MySQL

UDFs have known limitations and problems. Please see: Are UDFs Harmful to SQL Server Performance?

There are many articles on this topic. Hopefully this is a non-subscriber access: Beware Row-by-Row Operations in UDF Clothing

SQL UDF and query optimization

A subquery will have better performance, but UDF can be reused much easier in other queries as well. You can use them to encapsulate specific calculations or logics at one place. If you need to change the logic you have to change only the UDF instead of changing all queries where you integrated that subquery.
At the end you gain flexibility but loose a performance when including the function in queries with huge amount of records.

Big Query User Defined Function dramatically slows down the query

When a user-defined JavaScript function is present in the query text, BigQuery initializes a JavaScript environment with the function's contents on every shard of execution. There is (at the time of this writing) no optimization to avoid loading the environment if the function is not referenced, since the expectation is that if there is a JavaScript UDF present, the intent is probably to use it. The discrepancy that you are seeing is due to the start-up time of the JavaScript environment.

With SQL UDFs, however, the story is different. While BigQuery still has to parse the SQL UDFs regardless of whether you use them in order to figure out where the actual query starts, there is minimal overhead associated with that.



Related Topics



Leave a reply



Submit