What's the Execute Order of the Different Parts of a SQL Select Statement

What's the execute order of the different parts of a SQL select statement?

Have a look at

SQL SERVER – Logical Query Processing Phases – Order of Statement Execution

  1. FROM
  2. ON
  3. OUTER
  4. WHERE
  5. GROUP BY
  6. CUBE | ROLLUP
  7. HAVING
  8. SELECT
  9. DISTINCT
  10. ORDER BY
  11. TOP

Also, for some good info see Logical Query Processing

Is there any specific order of execution in SQL query?

There is a logical order to evaluation of the query text, but the database engine can choose what order execute the query components based upon what is most optimal. The logical text parsing ordering is listed below. That is, for example, why you can't use an alias from SELECT clause in a WHERE clause. As far as the query parsing process is concerned, the alias doesn't exist yet.

  1. FROM

  2. ON

  3. OUTER

  4. WHERE

  5. GROUP BY

  6. CUBE | ROLLUP (these are not present in MySQL but are in some other SQL dialects)

  7. HAVING

  8. SELECT

  9. DISTINCT

  10. ORDER BY

  11. LIMIT (or, in MSSQL, TOP)

See the Microsoft documentation (see "Logical Processing Order of the SELECT statement") for more information on this.

Order Of Execution of the SQL query

SQL is a declarative language. The result of a query must be what you would get if you evaluated as follows (from Microsoft):

Logical Processing Order of the SELECT statement

The following steps show the logical
processing order, or binding order,
for a SELECT statement. This order
determines when the objects defined in
one step are made available to the
clauses in subsequent steps. For
example, if the query processor can
bind to (access) the tables or views
defined in the FROM clause, these
objects and their columns are made
available to all subsequent steps.
Conversely, because the SELECT clause
is step 8, any column aliases or
derived columns defined in that clause
cannot be referenced by preceding
clauses. However, they can be
referenced by subsequent clauses such
as the ORDER BY clause. Note that the
actual physical execution of the
statement is determined by the query
processor and the order may vary from
this list.

  1. FROM
  2. ON
  3. JOIN
  4. WHERE
  5. GROUP BY
  6. WITH CUBE or WITH ROLLUP
  7. HAVING
  8. SELECT
  9. DISTINCT
  10. ORDER BY
  11. TOP

The optimizer is free to choose any order it feels appropriate to produce the best execution time. Given any SQL query, is basically impossible to anybody to pretend it knows the execution order. If you add detailed information about the schema involved (exact tables and indexes definition) and the estimated cardinalities (size of data and selectivity of keys) then one can take a guess at the probable execution order.

Ultimately, the only correct 'order' is the one described ion the actual execution plan. See Displaying Execution Plans by Using SQL Server Profiler Event Classes and Displaying Graphical Execution Plans (SQL Server Management Studio).

A completely different thing though is how do queries, subqueries and expressions project themselves into 'validity'. For instance if you have an aliased expression in the SELECT projection list, can you use the alias in the WHERE clause? Like this:

SELECT a+b as c
FROM t
WHERE c=...;

Is the use of c alias valid in the where clause? The answer is NO. Queries form a syntax tree, and a lower branch of the tree cannot be reference something defined higher in the tree. This is not necessarily an order of 'execution', is more of a syntax parsing issue. It is equivalent to writing this code in C#:

void Select (int a, int b)
{
if (c = ...) then {...}
int c = a+b;
}

Just as in C# this code won't compile because the variable c is used before is defined, the SELECT above won't compile properly because the alias c is referenced lower in the tree than is actually defined.

Unfortunately, unlike the well known rules of C/C# language parsing, the SQL rules of how the query tree is built are somehow esoteric. There is a brief mention of them in Single SQL Statement Processing but a detailed discussion of how they are created, and what order is valid and what not, I don't know of any source. I'm not saying there aren't good sources, I'm sure some of the good SQL books out there cover this topic.

Note that the syntax tree order does not match the visual order of the SQL text. For example the ORDER BY clause is usually the last in the SQL text, but as a syntax tree it sits above everything else (it sorts the output of the SELECT, so it sits above the SELECTed columns so to speak) and as such is is valid to reference the c alias:

SELECT a+b as c
FROM t
ORDER BY c;

In what order SQL statement execute if Select statement has CONCAT function on some columns?

According to the SQL standard, the GROUP BY is parsed before the SELECT.

However, this is not a hard-and-fast rule among databases. What BigQuery is doing is determining the column aliases from the SELECT. It then allows these aliases in the GROUP BY. Other databases do this as well -- Postgres and all its derived databases for example.

Do not confuse the parsing of the query with the execution, though. The execution is through some very complicated parallel directed acyclic graph. What is happening here is simply that the BigQuery parser is (conveniently) allowing users to use table aliases in GROUP BY.

Does the order of where clauses matter in SQL?

No, that order doesn't matter (or at least: shouldn't matter).

Any decent query optimizer will look at all the parts of the WHERE clause and figure out the most efficient way to satisfy that query.

I know the SQL Server query optimizer will pick a suitable index - no matter which order you have your two conditions in. I assume other RDBMS will have similar strategies.

What does matter is whether or not you have a suitable index for this!

In the case of SQL Server, it will likely use an index if you have:

  • an index on (LastName, FirstName)
  • an index on (FirstName, LastName)
  • an index on just (LastName), or just (FirstName) (or both)

On the other hand - again for SQL Server - if you use SELECT * to grab all columns from a table, and the table is rather small, then there's a good chance the query optimizer will just do a table (or clustered index) scan instead of using an index (because the lookup into the full data page to get all other columns just gets too expensive very quickly).

Confusion about the execution order in SQL query

Well it is the behavior of MySql extension to GROUP BY

MySQL Extensions to GROUP
BY


MySQL extends this behavior to permit the use of an alias in the
HAVING clause for the aggregated column

You can disable that extension with sql_mode ONLY_FULL_GROUP_BY

SET [SESSION | GLOBAL] sql_mode = ONLY_FULL_GROUP_BY;

If you try to execute above-mentioned query in ONLY_FULL_GROUP_BY sql_mode you'll get as the following error message:

Non-grouping field 'media_sucursal' is used in HAVING clause: SELECT
nombre_sucursal, AVG(saldo) AS media_sucursal FROM cuenta GROUP BY
nombre_sucursal HAVING media_sucursal > 800

as expected.

Here is SQLFiddle demo that illustrates that.

What is the execution order of the PARTITION BY clause compared to other SQL clauses?

Window functions are executed/calculated at the same stage as SELECT, stage 5 in your table. In other words, window functions are applied to all rows that are "visible" in the SELECT stage.

In your second example

Select top 1 *, 
row_number() over (Partition by Name order by Date)
from NPtable
Where Date > '2018-01-02 00:00:00'

WHERE is logically applied before Partition by Name of the row_number() function.

Note, that this is logical order of processing the query, not necessarily how the engine physically processes the data.

If query optimiser decides that it is cheaper to scan the whole table and later discard dates according to the WHERE filter, it can do it. But, any kind of these transformations must be performed in such a way that the final result is consistent with the order of the logical steps outlined in the table you showed.



Related Topics



Leave a reply



Submit