Using with VS Declare a Temporary Table: Performance/Difference

Using with vs declare a temporary table: performance / difference?

The @table syntax creates a table variable (an actual table in tempdb) and materialises the results to it.

The WITH syntax defines a Common Table Expression which is not materialised and is just an inline View.

Most of the time you would be better off using the second option. You mention that this is inside a function. If this is a TVF then most of the time you want these to be inline rather than multi statement so they can be expanded out by the optimiser - this would instantly disallow the use of table variables.

Sometimes however (say the underlying query is expensive and you want to avoid it being executed multiple times) you might determine that materializing the intermediate results improves performance in some specific cases. There is currently no way of forcing this for CTEs (without forcing a plan guide at least)

In that eventuality you (in general) have 3 options. A @tablevariable, #localtemp table and a ##globaltemp table. However only the first of these is permitted for use inside a function.

For further information regarding the differences between table variables and #temp tables see here.

SELECT INTO vs WITH AS: Who is faster in the temp table approach?

You are confusing two concepts.

SELECT INTO creates a new table. That could be a temporary table or a permanent table. But the table is created.

WITH defines a common table expression (CTE) used within a single query. This is not a "table". It is simply a subquery and it may or may not be materialized as a temporary table (actually, SQL Server does not typically materialize CTEs).

You use SELECT INTO when you want a real table. Some reasons for that are:

  • Sharing data among multiple queries.
  • Collecting correct statistics to help the query optimizer.
  • Adding indexes to improve subsequent query performance.

You use a CTE when you want a named subquery in a query. If you are choosing between the two, you probably want to start with a CTE.

temp tables faster than variable tables big query

Temporary tables give flexibility to make customized tables for data visualization, as per the analytics requirements. More so, the use-case of TEMP is in the local temporary tables, only visible to the current session.

Meanwhile, the WITH clause acts as a temporary table, but it is actually a result of a subquery which can be used somewhere else.

The time difference that you get is because temporary tables use cache query results. This means that the query values are stored in the cache memory. That’s why it is faster to execute than the queries with the WITH clause. Sometimes, when you run a duplicate query, BigQuery attempts to reuse cached results.

When should I use a table variable vs temporary table in sql server?

Your question shows you have succumbed to some of the common misconceptions surrounding table variables and temporary tables.

I have written quite an extensive answer on the DBA site looking at the differences between the two object types. This also addresses your question about disk vs memory (I didn't see any significant difference in behaviour between the two).

Regarding the question in the title though as to when to use a table variable vs a local temporary table you don't always have a choice. In functions, for example, it is only possible to use a table variable and if you need to write to the table in a child scope then only a #temp table will do
(table-valued parameters allow readonly access).

Where you do have a choice some suggestions are below (though the most reliable method is to simply test both with your specific workload).

  1. If you need an index that cannot be created on a table variable then you will of course need a #temporary table. The details of this are version dependant however. For SQL Server 2012 and below the only indexes that could be created on table variables were those implicitly created through a UNIQUE or PRIMARY KEY constraint. SQL Server 2014 introduced inline index syntax for a subset of the options available in CREATE INDEX. This has been extended since to allow filtered index conditions. Indexes with INCLUDE-d columns or columnstore indexes are still not possible to create on table variables however.

  2. If you will be repeatedly adding and deleting large numbers of rows from the table then use a #temporary table. That supports TRUNCATE (which is more efficient than DELETE for large tables) and additionally subsequent inserts following a TRUNCATE can have better performance than those following a DELETE as illustrated here.

  3. If you will be deleting or updating a large number of rows then the temp table may well perform much better than a table variable - if it is able to use rowset sharing (see "Effects of rowset sharing" below for an example).
  4. If the optimal plan using the table will vary dependent on data then use a #temporary table. That supports creation of statistics which allows the plan to be dynamically recompiled according to the data (though for cached temporary tables in stored procedures the recompilation behaviour needs to be understood separately).
  5. If the optimal plan for the query using the table is unlikely to ever change then you may consider a table variable to skip the overhead of statistics creation and recompiles (would possibly require hints to fix the plan you want).
  6. If the source for the data inserted to the table is from a potentially expensive SELECT statement then consider that using a table variable will block the possibility of this using a parallel plan.
  7. If you need the data in the table to survive a rollback of an outer user transaction then use a table variable. A possible use case for this might be logging the progress of different steps in a long SQL batch.
  8. When using a #temp table within a user transaction locks can be held longer than for table variables (potentially until the end of transaction vs end of statement dependent on the type of lock and isolation level) and also it can prevent truncation of the tempdb transaction log until the user transaction ends. So this might favour the use of table variables.
  9. Within stored routines, both table variables and temporary tables can be cached. The metadata maintenance for cached table variables is less than that for #temporary tables. Bob Ward points out in his tempdb presentation that this can cause additional contention on system tables under conditions of high concurrency. Additionally, when dealing with small quantities of data this can make a measurable difference to performance.

Effects of rowset sharing

DECLARE @T TABLE(id INT PRIMARY KEY, Flag BIT);

CREATE TABLE #T (id INT PRIMARY KEY, Flag BIT);

INSERT INTO @T
output inserted.* into #T
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY @@SPID), 0
FROM master..spt_values v1, master..spt_values v2

SET STATISTICS TIME ON

/*CPU time = 7016 ms, elapsed time = 7860 ms.*/
UPDATE @T SET Flag=1;

/*CPU time = 6234 ms, elapsed time = 7236 ms.*/
DELETE FROM @T

/* CPU time = 828 ms, elapsed time = 1120 ms.*/
UPDATE #T SET Flag=1;

/*CPU time = 672 ms, elapsed time = 980 ms.*/
DELETE FROM #T

DROP TABLE #T

Is using Table variables faster than temp tables

Temp tables are better in performance. If you use a Table Variable and the Data in the Variable gets too big, the SQL Server converts the Variable automatically into a temp table.

It depends, like almost every Database related question, on what you try to do. So it is hard to answer without more information.

So my answer is, try it and have a look at the execution plan. Use the fastest way with the lowest costs.

  • MSDN - Displaying Graphical Execution Plans (SQL Server Management Studio)

Why is there a HUGE performance difference between temp table and subselect

Why it's not recommended to use subqueries?

Database Optimizer (regardless of what database you are using) can not always properly optimize such query (with subqueries). In this case, the problem to the optimizer is to choose the right way to join result sets. There are several algorithms for joining two result sets. The choice of the algorithm depends on the number of records which are contained in one and in the other result set. In case if you join two physical tables (subquery is not a physical table), the database can easily determine the amount of data in two result sets by the available statistics. If one of result sets is a subquery then to understand how many records it returns is very difficult. In this case the database can choose wrong query plan of join, so that will lead to a dramatic reduction in the performance of the query.

Rewriting the query with using temporary tables is intended to simplify the database optimizer. In the rewritten query all result sets participating in joins will be physical tables and the database will easily determine the length of each result set. This will allow the database to choose the guaranteed fastest of all possible query plans. Moreover, the database will make the right choice no matter what are the conditions. The rewritten query with temporary tables would work well on any database, this is especially important in the development of portable solutions. In addition, the rewritten query is easier to read, easier to understand and to debug.

It is understood that rewriting the query with temporary tables can lead to some slowdown due to additional expenses: creation of temporary tables. If the database will not be mistaken with the choice of the query plan, it will perform the old query faster than a new one. However, this slowdown will always be negligible. Typically the creation of a temporary table takes a few milliseconds. That is, the delay can not have a significant impact on system performance, and usually can be ignored.

Important! Do not forget to create indexes for temporary tables. The index fields should include all fields that are used in join conditions.



Related Topics



Leave a reply



Submit