In Which Sequence Are Queries and Sub-Queries Executed by the SQL Engine

In which sequence are queries and sub-queries executed by the SQL engine?

Option 4 is close.

SQL is declarative: you tell the query optimiser what you want and it works out the best (subject to time/"cost" etc) way of doing it. This may vary for outwardly identical queries and tables depending on statistics, data distribution, row counts, parallelism and god knows what else.

This means there is no fixed order. But it's not quite "on the fly"

Even with identical servers, schema, queries, and data I've seen execution plans differ

Order of Execution of correlated Sub-queries in select and where clauses and Cross Apply

One novel feature of the relational model was the separation of the logic representation of data (tables, queries) from the physical (disk files, run-time execution). The list you gave is the logical sequence of a query's clauses. It does not represent the physical sequence in which the run-time engine executes.

There is a component of the DBMS called the query optimizer (QO). Its job is to translate the logical definition of the desired output, i.e. the SQL query, into a reasonably efficient physical implementation. It is free to re-arrange the parts of the query into any provably equivalent configuration.

For example, if the query has an ORDER BY the QO may decide to perform a sort at the end of execution. Alternatively at the outset it may read data that is already known to be in the desired order because of an index. Two very different physical implementations which give the same logical outcome.

The process of choosing the physical implementation is knows as query planning. It is a deep and fascinating topic. Nowadays most DBMS use a cost-based optimiser. Lists of alternative plans are generated, the cost of each is evaluated according to some internal, proprietary cost function, and the cheapest is chosen to be executed for the query. SQL Server's optimizer is based on the Cascades framework. There are many resources on the web that explain it.

To answer your actual question, logically sub-queries in general fit into whichever part of the SQL they are written. If they are embedded in the FROM (..from T1 inner join (select x from t2) as y..) they're considered part of the FROM. If in the SELECT (select a, (select b from c where d='e') as f, g, h..) they're part of the SELECT. Physically, however, they are evaluated wherever the optimizer considers it best to do so.

"Is the correlated sub query executed 10k times or 50 times" - it could be either or none of these. It may be executed once and cached within the run-time. It would depend on the precise SQL, the table definitions, the number of rows involved in each table, what options are set at compile time and run-time. If you want a full explanation ask a new question which includes the definitions of all tables, indexes and constraints. Copy the actual execution plan to https://www.brentozar.com/pastetheplan/. There are plenty of regulars here on DBA.SE who can explain what it means.

Subqueries do present additional optimization challenges. There's a paper "Execution Strategies for SQL Subqueries" by Mostafa Elhemali et al which I found interesting and readable.

SQL order of execution for correlated subquery


When the above query executes, how does a generic SQL engine proceed?
From what I have read, SQL execution order is (roughly): From, Where,
Group By, Having, and Select.

This statement is -- generally -- not correct. SQL is parsed in the order that you describe. However, the execution is determined by the optimizer and might have little to do with the original query. Remember: SQL is a descriptive language, not a procedural language. It describes the result set, not the specific steps for calculating it.

That said, MySQL's execution plan is much closer to the query than most other databases (particularly more advanced databases with better optimizers). And, almost any database is going to proceed in the steps you describe for this query. The aggregation in the subquery limits the choices for optimization.

If you want to eliminate the redundancy, then do the select distinct before the filtering:

SELECT dept_nbr
FROM (SELECT DISTINCT dept_nbr FROM Personnel P1) P1
WHERE (SELECT COUNT(P2.dept_nbr)
FROM Personnel AS P2
WHERE P1.dept_nbr = P2.dept_nbr AND P2.job_title = 'Programmer'
) < 3;

You can also do this more simply with just an aggregation:

select dept_nbr
from personnel
group by dept_nbr
having sum(job_title = 'Programmer') < 3;

SQL newbie: execution order of subqueries?

The answer is that SQL is a descriptive language that describes the result set being produced from a query. It does not specify how the query is going to be run.

In your case the query has several options on how it might run, depending on the database engine, what the tables look like, and indexes. The query itself:

SELECT t.*
FROM someTable t
WHERE t.someFirstValue = t.someSecondValue AND
EXISTS (SELECT *
FROM someOtherTable t2
WHERE t.someFirstValue = t2.someThirdValue
);

Says: "Get me all columns from SomeTable where someFirstValue = someSecondValue and there is a corresponding row in someOtherTable where that's table column someThirdValue is the same as someFirstValue".

One possible way to approach this query would be to scan someTable and first check for the first condition. When the two columns match, then look up someFirstValue in an index on someOtherTable(someThirdValue) and keep the row if the values match. As I say, this is one approach, and there are others.

Does the order of JOIN vs WHERE in SQL affect performance?

Postgres has a smart optimizer so the two versions should have similar execution plans, under most cases (I'll return to that in a moment).

MySQL has a tendency to materialize subqueries. Although this has gotten better in more recent versions, I still recommend avoiding it. Materializing subqueries prevents the use of indexes and can have a significant impact on performance.

One caveat: If the subquery is complicated, then it might be better to filter as part of the subquery. For instance, if it is an aggregation, then filtering before aggregating usually results in better performance. That said, Postgres is smart about pushing conditions into the subquery. So, if the outer filtering is on a key used in aggregation, Postgres is smart enough to push the condition into the subquery.

Is there any specific order of execution in SQL query?

There is a logical order to evaluation of the query text, but the database engine can choose what order execute the query components based upon what is most optimal. The logical text parsing ordering is listed below. That is, for example, why you can't use an alias from SELECT clause in a WHERE clause. As far as the query parsing process is concerned, the alias doesn't exist yet.

  1. FROM

  2. ON

  3. OUTER

  4. WHERE

  5. GROUP BY

  6. CUBE | ROLLUP (these are not present in MySQL but are in some other SQL dialects)

  7. HAVING

  8. SELECT

  9. DISTINCT

  10. ORDER BY

  11. LIMIT (or, in MSSQL, TOP)

See the Microsoft documentation (see "Logical Processing Order of the SELECT statement") for more information on this.

Querying sub queries does not appear to work

You need to give your COUNT(VwNIMEventFct.NIM_EVENT_TYPE_ID) a name, otherwise SQL will create some name for you. Try this:

SELECT usersAndDlCount.NIM_USER_ID, usersAndDlCount.NIM_EVENT_TYPE_ID
FROM (SELECT NIM_USER_ID, COUNT(NIM_EVENT_TYPE_ID) AS NIM_EVENT_TYPE_ID
FROM RDMAVWSANDBOX.VwNIMEventFct
WHERE NIM_EVENT_TYPE_ID = 884
GROUP BY NIM_USER_ID) usersAndDlCount
WHERE NIM_USER_ID >100


Related Topics



Leave a reply



Submit