Why Is There a Huge Performance Difference Between Temp Table and Subselect

Why is there a HUGE performance difference between temp table and subselect

Why it's not recommended to use subqueries?

Database Optimizer (regardless of what database you are using) can not always properly optimize such query (with subqueries). In this case, the problem to the optimizer is to choose the right way to join result sets. There are several algorithms for joining two result sets. The choice of the algorithm depends on the number of records which are contained in one and in the other result set. In case if you join two physical tables (subquery is not a physical table), the database can easily determine the amount of data in two result sets by the available statistics. If one of result sets is a subquery then to understand how many records it returns is very difficult. In this case the database can choose wrong query plan of join, so that will lead to a dramatic reduction in the performance of the query.

Rewriting the query with using temporary tables is intended to simplify the database optimizer. In the rewritten query all result sets participating in joins will be physical tables and the database will easily determine the length of each result set. This will allow the database to choose the guaranteed fastest of all possible query plans. Moreover, the database will make the right choice no matter what are the conditions. The rewritten query with temporary tables would work well on any database, this is especially important in the development of portable solutions. In addition, the rewritten query is easier to read, easier to understand and to debug.

It is understood that rewriting the query with temporary tables can lead to some slowdown due to additional expenses: creation of temporary tables. If the database will not be mistaken with the choice of the query plan, it will perform the old query faster than a new one. However, this slowdown will always be negligible. Typically the creation of a temporary table takes a few milliseconds. That is, the delay can not have a significant impact on system performance, and usually can be ignored.

Important! Do not forget to create indexes for temporary tables. The index fields should include all fields that are used in join conditions.

Should all sub queries be replaced with temporary tables?

That is ridiculous. A joke.

Your "subquery" is fine as is. SQL Server just ignores it. You could rewrite it as:

SELECT something
FROM T1 JOIN . . .
WHERE condition1

SQL Server should optimize this correctly.

In my experience with SQL Server, there have been very few cases where creating a temporary table is needed for optimizing a query. A bit more often, I use query hints to avoid nested loop joins.

If a temporary table is needed, then there would almost always be indexes on the table. That is one of the key reasons for using a temporary table. (The two others are because the same query block is repeated through one query or multiple queries).

Is there a performance difference between CTE , Sub-Query, Temporary Table or Table Variable?

SQL is a declarative language, not a procedural language. That is, you construct a SQL statement to describe the results that you want. You are not telling the SQL engine how to do the work.

As a general rule, it is a good idea to let the SQL engine and SQL optimizer find the best query plan. There are many person-years of effort that go into developing a SQL engine, so let the engineers do what they know how to do.

Of course, there are situations where the query plan is not optimal. Then you want to use query hints, restructure the query, update statistics, use temporary tables, add indexes, and so on to get better performance.

As for your question. The performance of CTEs and subqueries should, in theory, be the same since both provide the same information to the query optimizer. One difference is that a CTE used more than once could be easily identified and calculated once. The results could then be stored and read multiple times. Unfortunately, SQL Server does not seem to take advantage of this basic optimization method (you might call this common subquery elimination).

Temporary tables are a different matter, because you are providing more guidance on how the query should be run. One major difference is that the optimizer can use statistics from the temporary table to establish its query plan. This can result in performance gains. Also, if you have a complicated CTE (subquery) that is used more than once, then storing it in a temporary table will often give a performance boost. The query is executed only once.

The answer to your question is that you need to play around to get the performance you expect, particularly for complex queries that are run on a regular basis. In an ideal world, the query optimizer would find the perfect execution path. Although it often does, you may be able to find a way to get better performance.

Why would using a temp table be faster than a nested query?

Obviously, SQL Server is choosing the wrong query plan. Yes, that can happen, I've had exactly the same scenario as you a few times.

The problem is that optimizing a query (you mention a "complex subquery") is a non-trivial task: If you have n tables, there are roughly n! possible join orders -- and that's just the beginning. So, it's quite plausible that doing (a) first your inner query and (b) then your outer query is a good way to go, but SQL Server cannot deduce this information in reasonable time.

What you can do is to help SQL Server. As Dan Tow writes in his great book "SQL Tuning", the key is usually the join order, going from the most selective to the least selective table. Using common sense (or the method described in his book, which is a lot better), you could determine which join order would be most appropriate and then use the FORCE ORDER query hint.

Anyway, every query is unique, there is no "magic button" to make SQL Server faster. If you really want to find out what is going on, you need to look at (or show us) the query plans of your queries. Other interesting data is shown by SET STATISTICS IO, which will tell you how much (costly) HDD access your query produces.

Temp tables vs subqueries in inner join

The two formulations are identical except that your explicit temp table version is 3 sql statements instead of just 1. That is, the overhead of the back and forth to the server makes it slower. But...

Since the implicit temp table is in a LEFT JOIN, that subquery may be evaluated in one of two ways...

  • Older versions of MySQL were 'dump' and re-evaluated it. Hence slow.
  • Newer versions automatically create an index. Hence fast.

Meanwhile, you could speed up the explicit temp table version by adding a suitable index. It would be PRIMARY KEY(collegiate_id). If there is a chance of that INNER JOIN producing dups, then say SELECT DISTINCT.

For "a few thousand" rows, you usually don't need to worry about performance.

Oracle has a zillion options for everything. MySQL has very few, with the default being (usually) the best. So ignore the answer that discussed various options that you could use in MySQL.

There are issues with

AND  IF(notCollegiate,
c.collegiate_id NOT IN (notCollegiate),
'1=1')

I can't tell which table notCollegiate is in. notCollegiate cannot be a list, so why use IN? Instead simply use !=. Finally, '1=1' is a 3-character string; did you really want that?

For performance (of either version)

  • remittances needs INDEX(type_id, name, remittance_id) with remittance_id specifically last.
  • collegiateRemittances needs INDEX(remittance_id) (unless it is the PK).
  • collegiates needs INDEX(typePayment, active, exentFee , approvedBoard) in any order.

Bottom line: Worry more about indexes than how you formulate the query.

Ouch. Another wrinkle. What is getFee()? If it is a Stored Function, maybe we need to worry about optimizing it?? And what is dateS?

temp tables faster than variable tables big query

Temporary tables give flexibility to make customized tables for data visualization, as per the analytics requirements. More so, the use-case of TEMP is in the local temporary tables, only visible to the current session.

Meanwhile, the WITH clause acts as a temporary table, but it is actually a result of a subquery which can be used somewhere else.

The time difference that you get is because temporary tables use cache query results. This means that the query values are stored in the cache memory. That’s why it is faster to execute than the queries with the WITH clause. Sometimes, when you run a duplicate query, BigQuery attempts to reuse cached results.

Which are more performant, CTE or temporary tables?

I'd say they are different concepts but not too different to say "chalk and cheese".

  • A temp table is good for re-use or to perform multiple processing passes on a set of data.

  • A CTE can be used either to recurse or to simply improved readability.

    And, like a view or inline table valued function can also be treated like a macro to be expanded in the main query

  • A temp table is another table with some rules around scope

I have stored procs where I use both (and table variables too)

Which one have better performance : Derived Tables or Temporary Tables

Derived table is a logical construct.

It may be stored in the tempdb, built at runtime by reevaluating the underlying statement each time it is accessed, or even optimized out at all.

Temporary table is a physical construct. It is a table in tempdb that is created and populated with the values.

Which one is better depends on the query they are used in, the statement that is used to derive a table, and many other factors.

For instance, CTE (common table expressions) in SQL Server can (and most probably will) be reevaluated each time they are used. This query:

WITH    q (uuid) AS
(
SELECT NEWID()
)
SELECT *
FROM q
UNION ALL
SELECT *
FROM q

will most probably yield two different NEWID()'s.

In this case, a temporary table should be used since it guarantees that its values persist.

On the other hand, this query:

SELECT  *
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS rn
FROM master
) q
WHERE rn BETWEEN 80 AND 100

is better with a derived table, because using a temporary table will require fetching all values from master, while this solution will just scan the first 100 records using the index on id.

SELECT INTO vs WITH AS: Who is faster in the temp table approach?

You are confusing two concepts.

SELECT INTO creates a new table. That could be a temporary table or a permanent table. But the table is created.

WITH defines a common table expression (CTE) used within a single query. This is not a "table". It is simply a subquery and it may or may not be materialized as a temporary table (actually, SQL Server does not typically materialize CTEs).

You use SELECT INTO when you want a real table. Some reasons for that are:

  • Sharing data among multiple queries.
  • Collecting correct statistics to help the query optimizer.
  • Adding indexes to improve subsequent query performance.

You use a CTE when you want a named subquery in a query. If you are choosing between the two, you probably want to start with a CTE.



Related Topics



Leave a reply



Submit